Go watch the DB2′s Got Talent grand finale replay, and vote for Michael!

Runstats – What and Why

On looking over my previous posts, I found no fewer than 25 of them that mentioned runstats. But I wanted to cover runstats in a different level of detail. With this post, I aim to answer 2 questions – What is runstats? and Why should you run runstats?

What

Runstats is a utility. That’s why we say “run runstats”. You may also hear “collect runstats” or “collect statistics”. Other RDBMS’s have a similar concept, but they may call it something slightly different. Runstats collects statistical information about the data in tables and indexes. You can view this information in views in the SYSSTAT schema. SYSCAT tables have some of the data as well.

There are a ton of options on the runstats command. Let’s look at my favorite syntax and what each part means.

db2 runstats on table schema.table with distribution and detailed indexes all

with distribution

This tells DB2 to collect distribution statistics. Distribution statistics include two things:

  • Frequent value statistics – DB2 notes the most frequent values. By default, the 10 most frequent values. Using the above syntax, this is collected for every column. You can change the number of most frequent values using the NUM_FREQVALUES database configuration parameter, or the NUM_FREQVALUES clause on the runstats command
  • Quantile statistics – Divides the values into NUM_QUANTILES (default: 20) sections to describe the distribution of the data. The default means that the optimizer should be able to estimate the number of values that would meet any one-sided predicate within 2.5% of the actual value.
I recommend always collecting distribution statistics for e-commerce databases.

and detailed indexes all

Collecting index statistics helps DB2 decide which if any indexes to use to satisfy a particular query. Collecting detailed index statistics allows db2 to more accurately estimate the page fetches that will be required – allowing db2 to properly estimate the cost of accessing a table through an index. DB2 will use this data along with bufferpool information to determine how much (if any) synchronous page cleaning will have to occur.

I recommend always collecting detailed index statistics for e-commerce databases.

Automated Runstats

DB2 offers a facility for running runstats “as needed”. It looks for a certain percentage of data change and does runstats as needed. But you cannot easily tell it to use the syntax you prefer. That’s one reason that I don’t like to use it. The other is that I’m a control freak, and I want to know exactly when runstats are running (since they can cause mild performance degradation), and that runstats for all tables was done at a specific time.

Why

Why do we gather all of this information? Especially when it can extend the time it takes to run runstats? DB2′s optimizer is very powerful, but its ability to choose the best access path is very heavily dependent on having current statistics. If you call DB2 support on a query performance issue one of the very first questions they will ask is when the last runstats was and if it covered all tables. The same goes for contacting WebSphere Commerce support on database performance issues, if you’re using WebSphere Commerce. I have seen old or missed runstats literally cause a site-down issue that was immediately resolved after runstats/rbind was complete. Give the optimizer the best chance of returning excellent query performance, and do runstats at least weekly for all non-volatile tables.

References

Info center entry on distribution statistics: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005093.html

Info center entry on detailed index statistics: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005095.html

6 comments… add one

  • Great article.

    I wanted to say, that if you want to control how the statistics are executed, you must probably use a profile, and then, the automatic maintenance will respect that profile. This is a good reason to activate the automatic runstats and still have the control, by configuring the statistic profiles.

    http://www.ibm.com/developerworks/data/library/techarticle/dm-0706tang/
    “Automatic statistics collection respects the profile a user has specified by using the registered profile option in the SYSCAT.TABLES catalog table. If no profile is specified, automatic statistics collection uses the following options when collecting statistics:”

    Reply
    • But I do have to set a profile for each individual table, right? I can see that working, but as a control freak, I also want to know exactly when runstats happened across the board. When someone asks, I want to be able to say “Saturday Morning” or “Last Night” without even thinking about it.

      Reply
  • Hi,

    Do I need to execute runstats for each table? can’t we run by schema?

    Thanks

    Reply
  • What is recommendation for running runstats, say my data changes by x% then I should do runstats? What is the recommended x%, 10%, 20% 30 % etc

    Reply
    • I don’t have that number. I don’t like that method of running runstats, so I don’t know the thresholds associated with it.

      Reply

Leave a Comment