Runstats – What and Why
On looking over my previous posts, I found no fewer than 25 of them that mentioned runstats. But I wanted to cover runstats in a different level of detail. With this post, I aim to answer 2 questions – What is runstats? and Why should you run runstats?
Runstats is a utility. That’s why we say “run runstats”. You may also hear “collect runstats” or “collect statistics”. Other RDBMS’s have a similar concept, but they may call it something slightly different. Runstats collects statistical information about the data in tables and indexes. You can view this information in views in the SYSSTAT schema. SYSCAT tables have some of the data as well.
There are a ton of options on the runstats command. Let’s look at my favorite syntax and what each part means.
db2 runstats on table schema.table with distribution and detailed indexes all
This tells DB2 to collect distribution statistics. Distribution statistics include two things:
- Frequent value statistics – DB2 notes the most frequent values. By default, the 10 most frequent values. Using the above syntax, this is collected for every column. You can change the number of most frequent values using the NUM_FREQVALUES database configuration parameter, or the NUM_FREQVALUES clause on the runstats command
- Quantile statistics – Divides the values into NUM_QUANTILES (default: 20) sections to describe the distribution of the data. The default means that the optimizer should be able to estimate the number of values that would meet any one-sided predicate within 2.5% of the actual value.
and detailed indexes all
Collecting index statistics helps DB2 decide which if any indexes to use to satisfy a particular query. Collecting detailed index statistics allows db2 to more accurately estimate the page fetches that will be required – allowing db2 to properly estimate the cost of accessing a table through an index. DB2 will use this data along with bufferpool information to determine how much (if any) synchronous page cleaning will have to occur.
I recommend always collecting detailed index statistics for e-commerce databases.
DB2 offers a facility for running runstats “as needed”. It looks for a certain percentage of data change and does runstats as needed. But you cannot easily tell it to use the syntax you prefer. That’s one reason that I don’t like to use it. The other is that I’m a control freak, and I want to know exactly when runstats are running (since they can cause mild performance degradation), and that runstats for all tables was done at a specific time.
Why do we gather all of this information? Especially when it can extend the time it takes to run runstats? DB2’s optimizer is very powerful, but its ability to choose the best access path is very heavily dependent on having current statistics. If you call DB2 support on a query performance issue one of the very first questions they will ask is when the last runstats was and if it covered all tables. The same goes for contacting WebSphere Commerce support on database performance issues, if you’re using WebSphere Commerce. I have seen old or missed runstats literally cause a site-down issue that was immediately resolved after runstats/rbind was complete. Give the optimizer the best chance of returning excellent query performance, and do runstats at least weekly for all non-volatile tables.
Info center entry on distribution statistics: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005093.html
Info center entry on detailed index statistics: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005095.html