DB2 Memory Area In-Depth: The Package Cache
The package cache is just one memory area that DB2 offers to tune memory usage for a DB2 database. This article is a deep dive into this memory area.
What is the Package Cache
The package cache is an area of memory that DB2 uses to store access plans. Access plans are detailed strategies for how DB2 will get to all of the data it needs to satisfy a query.
SQL is a language where we tell DB2 what data we want. We don’t tell DB2 with computer precision where that data is on disk or the methodology that it should use to integrate data into a single result set. We tell it to join two tables, but we don’t give it the specific join methodology to line up rows from the two tables.
DB2 has a wickedly powerful cost-based optimizer that takes the details we feed it about the result set we want, and turns that into the incredibly detailed plan of how to retrieve all that data in the most efficient way and integrate it. There are multiple people at IBM whose whole jobs and careers are based on writing, testing, and improving this optimizer. The more powerful your cost-based optimizer, the more powerful your RDBMS. In my opinion, a good cost-based optimizer is a key differentiator of a relational database management system over many flavors of NO-SQL.
While the DB2 optimizer usually saves vastly more time than if we just accessed all data by table scan and nested loop join, there are still costs associated with that work. In many DB2 databases, the same or vastly similar SQL is re-executed over and over again. DB2 places the access plans generated in a memory area called the package cache. The package cache stores access plans for both static and dynamic SQL.
Sizing the Package Cache
The package cache, in recent versions, is set to AUTOMATIC by default and is tuned as part of STMM (the Self-Tuning Memory Manager). This means that DB2 can tune it for you.
This is, however, the one parameter I nearly always remove from self-tuning. Particularly in databases that make poor use of parameter markers, and thus have few reusable access plans, DB2 tends to over-size the package cache.
If the package cache is too small, agents will have to constantly re-compile dynamic SQL while applications are waiting for that compilation to happen.
Package Cache Overflows
Package cache overflows happen when there is not enough memory in the package cache to hold the access plans for all active SQL. If this happens, DB2 will borrow some memory from the database overflow buffer. Borrowing this memory takes some time and resources and means applications will wait even longer on compiling SQL. You can look for package cache overflows in the db2 diagnostic log like this:
SELECT TIMESTAMP , substr(APPL_ID,1,15) as APPL_ID_TRUNC , MSGSEVERITY as SEV , MSGNUM FROM TABLE ( PD_GET_LOG_MSGS( CURRENT_TIMESTAMP - 7 DAYS)) AS T WHERE MSGNUM=4500 ORDER BY TIMESTAMP DESC; TIMESTAMP APPL_ID_TRUNC SEV MSGNUM -------------------------- --------------- --- ----------- 2016-02-18-10.36.52.270951 192.0.2.0. W 4500 2016-02-18-10.36.44.129013 192.0.2.0. W 4500 2016-02-18-10.36.36.298252 192.0.2.0. W 4500 2016-02-17-10.38.23.363510 192.0.2.0. W 4500 2016-02-17-10.38.15.112473 192.0.2.0. W 4500
The above is slightly more accurate than using
-e 4500, as that can return false positives.
There is also a counter of package cache overflows since the last database restart:
select pkg_cache_num_overflows from table(mon_get_database(-2)) as t with ur PKG_CACHE_NUM_OVERFLOWS ----------------------- 10347 1 record(s) selected.
Generally we want to tune the package cache to a large enough size that we’re not seeing frequent package cache overflows.
Package Cache Hit Ratio
Another performance indicator for the package cache is the package cache hit ratio. The package cache hit ratio tell us how often an agent went to see if there was already an access plan in the package cache and found it was already there.
The package cache hit ratio can be calculated like this:
select decimal((1-(float(pkg_cache_inserts)/float(pkg_cache_lookups))) * 100,5,2) from table(mon_get_database(-2)) as t with ur 1 ------- 34.40 1 record(s) selected.
Before you tune the package cache to improve your package cache hit ratio, consider what the database workload looks like. There are some database workloads that will never get a high package cache hit ratio because the SQL executed is either truly unique SQL or the application is not making appropriate use of parameter markers. Some data warehousing environments may have truly unique SQL and you will never achieve a good package cache hit ratio, no matter how big you make it. The memory is better used in other places. If there is a bad package cache hit ratio due to lack of proper use of parameter markers, then consider educating your developers on the proper use of parameter markers or as a last resort using the statement concentrator.
In an e-commerce or OLTP database, this is one of the top 10 key performance indicators that I look at when looking at performance for a system. I want to see my package cache hit ratio above 95% in those environments. e-commerce and OLTP environments should be repetitively executing the same sets of queries over and over again with different values.
If the package cache is too large, it is using memory that would be more beneficial if allocated to other important memory areas like sort or buffer pools. An overly large package cache can also make analysis of SQL based on the data in the package cache unreasonably difficult.
Size of the Package Cache
The package cache is defined by the database configuration parameter PCKCACHESZ. I have to look up the spelling of this parameter EVERY TIME, because it just doesn’t seem logically consistent with other database configuration parameters to me.
It is easy to determine the size of the package cache:
db2 get db cfg for SAMPLE |grep PCKCACHESZ Package cache size (4KB) (PCKCACHESZ) = AUTOMATIC(18819)
select integer(value) as pckcachesz_4kPG , value_flags from sysibmadm.dbcfg where name='pckcachesz' with ur PCKCACHESZ_4KPG VALUE_FLAGS --------------- ----------- 18819 AUTOMATIC 1 record(s) selected.
The size of the package cache can be updated using this syntax:
db2 update db cfg for SAMPLE using pckcachesz <NUM>
Data Available in the Package Cache
In addition to serving a very useful function, there are also details about SQL that has been executed against the database. This information includes:
- Query statements
- Number of executions
- Execution time
- Lock wait time
- Sort time
- CPU time
- Rows read
- Rows returned
- And much more
This information is collected for each unique statement, but it is cumulatice across all executions of the statement since the statement was placed into the package cache. DB2 keeps about as much as it can in the package cache, but especially if the package cache is on the small side, statements may be moving in and out of the package cache all the time. This means that it is possible for the data for one statement to represent executions over the last 10 months, while the very next statement might be over the last 10 seconds.
Having this data be cumulative is very useful. It can be a potent tool for identifying problem SQL that is running against a database.
To query this data, see my favorite SQL statement for identifying problem SQL.
Data Not Available in the Package Cache
There are some pieces of data that we might sometimes want that are not available in the package cache. This includes:
- When a particular query was executed
- Some details on static SQL
- If using snapshot monitor
- Prior to DB2 9.7
- Specific values used in execution of a statement when that statement uses parameter markers
- Literal values used during execution, if the statement concentrator was used for that statement
Most of these details can be collected using a more invasive activity event monitor. Many of these pieces information are simply not possible to gather in a cumulative environment like the package cache.
Parameter markers are placeholders for values in SQL statements. They allow DB2 to compile one access plan and then use that access plan for the same statement with different supplied values. In the same statement, some values may make sense as parameter markers, while others make more sense as static values. It is up to the developer or vendor to chose the proper places to use or not use parameter markers. This is not something that the DBA often has much input on, though educating developers on such details falls to many DBAs.
One of the disadvantages of parameter markers is that using them means that distribution statistics will not be used for that particular portion of the query. The advantage of a generic access plan is that it doesn’t have to be re-compiled over and over. The disadvantage is that in some cases of data skew, the access plan will be slower than one that took into account the specific values.
What happens if you’re dealing with developers or a vendor who are unwilling or unable to make proper use of parameter markers? DB2 has a band-aid for that! The statement concentrator can be enabled for certain connections using the CLI configuration or for all connections using the database configuration parameter STMT_CONC. Setting STMT_CONC to LITERALS tells DB2 to treat all literal values as parameter markers. DB2 makes an exception for statements that already contain parameter markers. DB2 assumes that if you were smart enough to use a parameter marker at one place in a statement, you intended the other values to be static.
Package Cache Statement Eviction Monitor
To help us deal with the transient nature of data in the package cache, IBM introduced the package cache statement eviction monitor. This event monitor will capture the data about a statement when it is evicted from the package cache so that data is not lost.
Creating the Event Monitor
Before creating a package cache statement eviction monitor, you must first have a buffer pool, a table space, and a system temporary table space with a 32k page size. The simplest syntax for that is:
create bufferpool buff32k pagesize 32K create tablespace dba32k pagesize 32K bufferpool buff32k create system temporary tablespace tempsys32k pagesize 32k bufferpool buff32k
You can create this event monitor to capture all evicted statements:
CREATE EVENT MONITOR MY_PKGCACHE_EVMON FOR PACKAGE CACHE WRITE TO UNFORMATTED EVENT TABLE (IN DBA32K)
Or, the event monitor can capture only statements that exceed defined thresholds:
CREATE EVENT MONITOR MY_PKGCACHE_EVMON FOR PACKAGE CACHE WHERE NUM_EXECUTIONS > 10 WRITE TO UNFORMATTED EVENT TABLE (IN DBA32K)
The three thresholds available for this are:
Using these thresholds can be useful to reduce the statement data collected, since we’re often talking tens of thousands of statements a day.
Querying the Data
One of the things I really like about this event monitor is that the output of this event monitor is formatted in the same way as the tables I query for real-time data like MON_GET_PKG_CACHE_STMT. This makes it easy to tweak an existing statement to reference the event monitor table instead.
As with all event monitors, you need to make sure you are archiving and pruning output to prevent this from filling up a critical disk. I generally like to keep my monitoring data in a separate table space on a separate disk to keep a mistake in pruning data from having any possibility of causing a database outage.
Side note: “The Whole Package Cache” refers to my second favorite DB2 podcast by Ian Bjorhovde (whose name I can finally spell without looking it up) and Fred Sobotka (whose name I still cannot spell without looking it up).