STMM Analysis Tool

I mostly like and use DB2’s Self Tuning Memory Memory Manager (STMM) for my OLTP databases where I have only one DB2 Instance/Database on a database server. I do have some areas that I do not let it set for me. I’ve recently learned about an analysis tool – Adam Storm did a presentation that mentioned it at IDUG 2014 in Phoenix.

Parameters that STMM Tunes

To begin with, it is important to understand what STMM tunes and what it doesn’t. I recommend reading When is ‘AUTOMATIC’ Not STMM?. There are essentially only 5 areas that STMM can change:

  1. DATABASE_MEMORY if AUTOMATIC
  2. SORTHEAP, SHEAPTHRES_SHR if AUTOMATIC, and SHEAPTHRES is 0
  3. BUFFERPOOLS if number of pages on CREATE/ALTER BUFFERPOOL is AUTOMATIC
  4. PCKCACHESZ if AUTOMATIC
  5. LOCKLIST, MAXLOCKS if AUTOMATIC (both must be automatic)

Any other parameters, even if they are set to “AUTOMATIC” are not part of STMM.

Why I don’t use STMM for PCKCACHESZ

A number of the e-commerce database servers I support are very much oversized for daily traffic. This is common for retail sites because there are always peak periods, and servers tend to be sized to handle those. Many retail clients have extremely drastic peak periods like Black Friday, Cyber Monday, or other very critical selling times.

I noticed for one of my clients that was significantly oversized on Memory, DB2 was making the package cache absolutely huge. I saw this:

Package cache size (4KB)                   (PCKCACHESZ) = AUTOMATIC(268480)

That’s a full GB allocated to the package cache. There were over 30,000 statements in package cache, the vast majority with only a single execution. The thing is that for my OLTP databases the statements for which performance is critical are often static SQL or they’re using parameter markers. Most of the ad-hoc statements that are only executed once I don’t really care if they’re stored in package cache. This was about a 50-100 GB database on a server with 64 GB of memory. The buffer pool hit ratios were awesome, so I guess DB2 didn’t really need the memory there, but still. In my mind, for well-run OLTP databases, that much package cache does not help performance. I am certain there may be databases that need that much or more in the Package Cache, but this database was simply not one of them. Because of this experience I set the package cache manually and tune it properly.

A few STMM Caveats

Just a few things to note – I have heard rumors of issues with STMM when there are multiple DB2 instances running on a server. I have not personally experienced this. Also, the settings that STMM is using are not transferred at all to the HADR standby, so when you fail over, you may have poor performance while STMM starts up. You could probably script a regular setting of the STMM parameters to deal with this. Also if you have a well-tuned, well performing non-STMM database there is probably little reason and not much reward in changing it to STMM. Most experts with database performance can likely tune the database better than STMM, but we can’t all be performance experts, or give as much time as we’d like to every database we support.

The STMM Log Parser

STMM logs the changes it make in parameter sizes both to the db2diag.log and to some STMM log files. (hint: IBM, maybe these could be used to periodically update the HADR standby too?). The log files are in the stmmlog subdirectory of the DIAGPATH. The log files aren’t exactly tough to read, but they don’t really present the information in an easy to view way. Entries look a bit like diagnostic log entries:

2014-07-02-23.44.40.788684+000 I10464203A600        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:2065
DATA #1 : String, 115 bytes
Going to sleep for 180000 milliseconds.
Interval = 5787, State = 0, intervalsBeforeStateChange = 0, lost4KPages = 0

2014-07-02-23.47.40.807231+000 I10464804A489        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:1909
MESSAGE : Activation stage ended

2014-07-02-23.47.40.807661+000 I10465294A488        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmMemoryTunerMain, probe:1913
MESSAGE : Starting New Interval

2014-07-02-23.47.40.808193+000 I10465783A925        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmLogRecordBeforeResizes, probe:590
DATA #1 : String, 435 bytes

***  stmmCostBenefitRecord ***
Type: LOCKLIST
PageSize: 4096
Benefit:
  -> Simulation size: 75
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Cost:
  -> Simulation size: 75
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Current Size: 27968
Minimum Size: 27968
Potential Increase Amount: 13984
Potential Increase Amount From OS: 13984
Potential Decrease Amount: 0
Pages Available For OS: 0

2014-07-02-23.47.40.808580+000 I10466709A993        LEVEL: Event
PID     : 18677976             TID : 46382          PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : WC42P1L1
APPHDL  : 0-12466              APPID: *LOCAL.DB2.140620223552
AUTHID  : DB2INST1             HOSTNAME: ecprwdb01s
EDUID   : 46382                EDUNAME: db2stmm (WC42P1L1) 0
FUNCTION: DB2 UDB, Self tuning memory manager, stmmLogRecordBeforeResizes, probe:590
DATA #1 : String, 502 bytes

***  stmmCostBenefitRecord ***
Type: BUFFER POOL ( BUFF_REF16K )
PageSize: 16384
Saved Misses: 0
Benefit:
  -> Simulation size: 2560
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Cost:
  -> Simulation size: 2560
  -> Total seconds saved: 0 (+ 0 ns)
  -> Normalized seconds/page: 0
Current Size: 25000
Minimum Size: 5000
Potential Increase Amount: 12500
Potential Increase Amount From OS: 12500
Potential Decrease Amount: 5000
Pages Available For OS: 5000
Interval Time: 180.029

Scrolling through each 10 MB file of this is not likely to give us a complete picture very easily. IBM is offering us, through developerWorks a log parser tool for STMM. The full writeup on it is here: http://www.ibm.com/developerworks/data/library/techarticle/dm-0708naqvi/index.html

The tool is free, and is a Perl script that DBAs can modify if they like. AIX and Linux tend to include Perl, and it’s not hard to install on Windows using ActivePerl or a number of other options. I happen to rather like a Perl utility as I do the vast majority of my database maintenance scripting in Perl.

Download and Set Up

The developerWorks link above includes the Perl script. Scroll down to the “download” section, click on “parseStmmLogFile.pl”, if you accept the terms and conditions, click “I Accept the Terms and Conditions”, and save the file. Then upload it to the database server you wish to use it on.

Syntax

There are several options here. Whenever you execute it, you will need to specify the name of one of your STMM logs, and the database name. The various options beyond that are covered below.

Examples

The default if you specify nothing beyond the file name and the database name is the s option. This gives you the new size at each interval of each heap that STMM manages. The output looks something like this:

 ./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE s
# Database: SAMPLE
[ MEMORY TUNER - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ newSz ]
[        ]      [                     ] [        ]      [        ]      [ LOCKLIST  BUFFERPOOL - BUFF16K:16K BUFFERPOOL - BUFF32K:32K BUFFERPOOL - BUFF4K BUFFERPOOL - BUFF8K:8K BUFFERPOOL - BUFF_CACHEIVL:8K BUFFERPOOL - BUFF_CAT16K:16K BUFFERPOOL - BUFF_CAT4K BUFFERPOOL - BUFF_CAT8K:8K BUFFERPOOL - BUFF_CTX BUFFERPOOL - BUFF_REF16K:16K BUFFERPOOL - BUFF_REF4K BUFFERPOOL - BUFF_REF8K:8K BUFFERPOOL - BUFF_SYSCAT BUFFERPOOL - BUFF_TEMP16K:16K BUFFERPOOL - BUFF_TEMP32K:32K BUFFERPOOL - BUFF_TEMP4K BUFFERPOOL - BUFF_TEMP8K:8K BUFFERPOOL - IBMDEFAULTBP ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 27968 12500 2500 2000000 50000 500000 25000 1000000 50000 1000000 25000 1000000 50000 50000 1000 1000 1000 1000 10000 ]

If you have a number of bufferpools, this can be hard to read, even on a large screen. the width of the numeric values is not hte same as their names, making it not all that tabular. To fix that, you can try the d option, which delimits the output with semicolons, making it easier to get into your favorite spreadsheet tool. The output in that case, raw looks like this:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE s d
# Database: SAMPLE
MEMORY TUNER - LOG ENTRIES
Interval;Date;Total Seconds;Difference in Seconds; LOCKLIST  ;  BUFFERPOOL - BUFF16K:16K ;  BUFFERPOOL - BUFF32K:32K ;  BUFFERPOOL - BUFF4K ;  BUFFERPOOL - BUFF8K:8K ;  BUFFERPOOL - BUFF_CACHEIVL:8K ;  BUFFERPOOL - BUFF_CAT16K:16K ;  BUFFERPOOL - BUFF_CAT4K ;  BUFFERPOOL - BUFF_CAT8K:8K ;  BUFFERPOOL - BUFF_CTX ;  BUFFERPOOL - BUFF_REF16K:16K ;  BUFFERPOOL - BUFF_REF4K ;  BUFFERPOOL - BUFF_REF8K:8K ;  BUFFERPOOL - BUFF_SYSCAT ;  BUFFERPOOL - BUFF_TEMP16K:16K ;  BUFFERPOOL - BUFF_TEMP32K:32K ;  BUFFERPOOL - BUFF_TEMP4K ;  BUFFERPOOL - BUFF_TEMP8K:8K ;  BUFFERPOOL - IBMDEFAULTBP ; ;
1;02/07/2014 00:17:27;180;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
2;02/07/2014 00:20:27;360;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
3;02/07/2014 00:23:27;540;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
4;02/07/2014 00:26:27;720;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;
5;02/07/2014 00:29:27;900;180; 27968; 12500; 2500; 2000000; 50000; 500000; 25000; 1000000; 50000; 1000000; 25000; 1000000; 50000; 50000; 1000; 1000; 1000; 1000; 10000;

Save it off to a file, import it into a spreadsheet, and you get something like this:
STMM_log_parser_output_s-1

Ok, and finally, you can make a pretty graph to look at these in a more human way:
STMM_log_parser_output_chart
Now that would be a lot more exciting if I ran it on a database where things were changing more often, but that’s the one I have to play with at the moment.

There are some other interesting options besides the s option. The b option shows the benefit analysis that STMM does, which looks pretty boring on my database, but still:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE b
# Database: SAMPLE
[ MEMORY TUNER - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ benefitNorm ]
[        ]      [                     ] [        ]      [        ]      [ LOCKLIST  BUFFERPOOL - BUFF16K:16K BUFFERPOOL - BUFF32K:32K BUFFERPOOL - BUFF4K BUFFERPOOL - BUFF8K:8K BUFFERPOOL - BUFF_CACHEIVL:8K BUFFERPOOL - BUFF_CAT16K:16K BUFFERPOOL - BUFF_CAT4K BUFFERPOOL - BUFF_CAT8K:8K BUFFERPOOL - BUFF_CTX BUFFERPOOL - BUFF_REF16K:16K BUFFERPOOL - BUFF_REF4K BUFFERPOOL - BUFF_REF8K:8K BUFFERPOOL - BUFF_SYSCAT BUFFERPOOL - BUFF_TEMP16K:16K BUFFERPOOL - BUFF_TEMP32K:32K BUFFERPOOL - BUFF_TEMP4K BUFFERPOOL - BUFF_TEMP8K:8K BUFFERPOOL - IBMDEFAULTBP ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

The o option shows only database memory and overflow buffer tuning:

./parseStmmLogFile.pl /db2diag/stmmlog/stmm.43.log SAMPLE o
# Database: SAMPLE
[ MEMORY TUNER - DATABASE MEMORY AND OVERFLOW BUFFER TUNING - LOG ENTRIES ]
[ Interv ]      [        Date         ] [ totSec ]      [ secDif ]      [ configMem ]   [ memAvail ]    [ setCfgSz ]
[      1 ]      [ 02/07/2014 00:17:27 ] [    180 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      2 ]      [ 02/07/2014 00:20:27 ] [    360 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      3 ]      [ 02/07/2014 00:23:27 ] [    540 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      4 ]      [ 02/07/2014 00:26:27 ] [    720 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]
[      5 ]      [ 02/07/2014 00:29:27 ] [    900 ]      [    180 ]      [ 6912 ]        [ 6912 ]        [ 1990 ]

There is also a 4 option that you can use to convert all values to 4k pages.

Summary

There are some useful things in the STMM log parser, if you want to understand the changes DB2 is making. Many of us, coming from fully manual tuning naturally distrust what STMM or other tuning tools are doing, so this level of transparency helps us understand what is happening and why it is or is not working in our environments. I would love to see more power in this. Being able to query this data with a table function or administrative view (we can with the db2diag.log!) would be even more useful so the output could be further limited and tweaked. The script is well documented, and I imagine I could tweak it to limit it if I wanted to. I’d love to have it call out actual changes – that would be harder to graph, but for the text output, could be more useful for a fairly dormant system.

You may also like...

6 Responses

  1. Somraj Chakrabarty says:

    Hi,

    I am a regular follower of your blog.I am getting some warning relating to STMM in my environment(DB2 V9.7 FP 9 running on Windows 2008 R2) as below.Could you please provide some information on this

    APPHDL : 0-38581 APPID: 10.248.204.91.57366.140807093552
    AUTHID : MXDB2USER
    EDUID : 12540 EDUNAME: db2agent (MAXDB71) 0
    FUNCTION: DB2 UDB, Self tuning memory manager, stmmSimulateSortMerge, probe:3388
    MESSAGE : ZRC=0x87AE0165=-2018639515=STMM_SEVERE_ERROR
    “Severe error encountered in self tuning memory manager”
    DATA #1 : String, 91 bytes
    ERROR simulation of real configuration has failed – not reporting benefit for this operator

  2. Isaac Munoz says:

    Hi Ember

    Great article thanks very much for the info about the stmm log parser… i will give it a try since we have large DBs using STMM for all the auto managed DB config parms.

    However I got a question when you mention “… for well-run OLTP databases, that much package cache does not help performance.”. Could you please elaborate bit more what performance cons has a big package cache then?. If you prefer we can discuss it offline via email.

    Thanks!

    • Ember Crooks says:

      The performance focus for OLTP databases is small, fast queries. New queries are not often introduced – the SQL run is known and tested. This also means there is a smaller subset of queries. Thus for the most part, the package cache needs to be on the small side. Most of the large queries are either non-critical for performance or extremely rarely run (montlhy and the like). I need a package cache that is large enough to hold those known and tested small queries. The memory is more useful for bufferpools or even sort areas and such.

      If I have an oversized system (I often have systems sized for Black Friday loads, but only get that kind of load once or twice a year), DB2 STMM seems to give me package cache to hold every query I ever execute even once. There are plenty of queries that I execute once that I don’t really care if the access plans are kept around – they won’t be used again, or won’t be used again for another month, or their performance is non-critical. Even during peak periods, my performance is focused on that small, known set of (often singleton row) queries.

      Furthermore, because the SQL it stores in the package cache is so infrequently executed, I can still get over 99% hit ratios with a smaller package cache.

      The other con that a large package cache has, besides using memory I could still better use elsewhere, is to make analysis more difficult. I find it much harder to store and work with 4 GB snapshot files or that much data stored in tables (especially when I take hourly snapshots), than more manageable sizes. I keep a shorter period of data or have seen my clients disable my hourly snapshots altogether. Depending on whether I have tools to deal with the data, I also have to transfer files from the server, import into Excel, and then deal with Excel’s slowness with so much data that I don’t really need.

      Obviously, it’s a personal choice and not a performance imperative, but I like a smaller package cache that still allows me to get over 99% hit ratios.

  3. Chaitanya says:

    Hi Ember, We in our OLTP environment have this issue coming very frequently these days..”Time out based commit point ” when application team run their jobs.. can u please let me know how to deal with this issue to have this solve permanently.

    Thanks so much ,
    Chaitanya

  1. July 15, 2014

    […] STMM Analysis Tool […]

Leave a Reply

Your email address will not be published. Required fields are marked *