On December 12th, I presented on the DB2 Night Show on HADR and TSAMP Advanced Topics. See the replay here: http://www.dbisoftware.com/blog/db2nightshow.php?id=531

Runstats – What and Why

On looking over my previous posts, I found no fewer than 25 of them that mentioned runstats. But I wanted to cover runstats in a different level of detail. With this post, I aim to answer 2 questions – What is runstats? and Why should you run runstats?

What

Runstats is a utility. That’s why we say “run runstats”. You may also hear “collect runstats” or “collect statistics”. Other RDBMS’s have a similar concept, but they may call it something slightly different. Runstats collects statistical information about the data in tables and indexes. You can view this information in views in the SYSSTAT schema. SYSCAT tables have some of the data as well.

There are a ton of options on the runstats command. Let’s look at my favorite syntax and what each part means.

db2 runstats on table schema.table with distribution and detailed indexes all

with distribution

This tells DB2 to collect distribution statistics. Distribution statistics include two things:

  • Frequent value statistics – DB2 notes the most frequent values. By default, the 10 most frequent values. Using the above syntax, this is collected for every column. You can change the number of most frequent values using the NUM_FREQVALUES database configuration parameter, or the NUM_FREQVALUES clause on the runstats command
  • Quantile statistics – Divides the values into NUM_QUANTILES (default: 20) sections to describe the distribution of the data. The default means that the optimizer should be able to estimate the number of values that would meet any one-sided predicate within 2.5% of the actual value.
I recommend always collecting distribution statistics for e-commerce databases.

and detailed indexes all

Collecting index statistics helps DB2 decide which if any indexes to use to satisfy a particular query. Collecting detailed index statistics allows db2 to more accurately estimate the page fetches that will be required – allowing db2 to properly estimate the cost of accessing a table through an index. DB2 will use this data along with bufferpool information to determine how much (if any) synchronous page cleaning will have to occur.

I recommend always collecting detailed index statistics for e-commerce databases.

Automated Runstats

DB2 offers a facility for running runstats “as needed”. It looks for a certain percentage of data change and does runstats as needed. But you cannot easily tell it to use the syntax you prefer. That’s one reason that I don’t like to use it. The other is that I’m a control freak, and I want to know exactly when runstats are running (since they can cause mild performance degradation), and that runstats for all tables was done at a specific time.

Why

Why do we gather all of this information? Especially when it can extend the time it takes to run runstats? DB2’s optimizer is very powerful, but its ability to choose the best access path is very heavily dependent on having current statistics. If you call DB2 support on a query performance issue one of the very first questions they will ask is when the last runstats was and if it covered all tables. The same goes for contacting WebSphere Commerce support on database performance issues, if you’re using WebSphere Commerce. I have seen old or missed runstats literally cause a site-down issue that was immediately resolved after runstats/rbind was complete. Give the optimizer the best chance of returning excellent query performance, and do runstats at least weekly for all non-volatile tables.

References

Info center entry on distribution statistics: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005093.html

Info center entry on detailed index statistics: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0005095.html

8 comments… add one

  • Great article.

    I wanted to say, that if you want to control how the statistics are executed, you must probably use a profile, and then, the automatic maintenance will respect that profile. This is a good reason to activate the automatic runstats and still have the control, by configuring the statistic profiles.

    http://www.ibm.com/developerworks/data/library/techarticle/dm-0706tang/
    “Automatic statistics collection respects the profile a user has specified by using the registered profile option in the SYSCAT.TABLES catalog table. If no profile is specified, automatic statistics collection uses the following options when collecting statistics:”

    Reply
    • But I do have to set a profile for each individual table, right? I can see that working, but as a control freak, I also want to know exactly when runstats happened across the board. When someone asks, I want to be able to say “Saturday Morning” or “Last Night” without even thinking about it.

      Reply
  • Hi,

    Do I need to execute runstats for each table? can’t we run by schema?

    Thanks

    Reply
  • What is recommendation for running runstats, say my data changes by x% then I should do runstats? What is the recommended x%, 10%, 20% 30 % etc

    Reply
    • I don’t have that number. I don’t like that method of running runstats, so I don’t know the thresholds associated with it.

      Reply
  • Is there any tool or SQL stmt to know when to run runstats as we do with REORGCHK.

    Reply
    • Usually if you’re doing runstats manually, then you have a fixed interval that you do runstats for all tables or perhaps categorize tables based on how often you need to do runstats on them based on data change rates (daily, weekly, monthly, etc).

      The driving factor is often data change rates, but I don’t have a specific query to track that. DB2’s automated runstats facilities do make decisions based on data change rates. But I don’t use them because I’m a control freak and I want to know when the last runstats was. Also because the data I’m querying in an e-commerce database tends to be the most recently added data.

      Personally I do runstats on all tables either daily or weekly depending on whether I’m in a build or post-go-live phase where data changes quickly or if I’m just in normal operation. This works well in e-commerce databases. I can see a point being made for different frequencies in different types of databases.

      Reply

Leave a Comment