Runstats – What and Why

On looking over my previous posts, I found no fewer than 25 of them that mentioned runstats. But I wanted to cover runstats in a different level of detail. With this post, I aim to answer 2 questions – What is runstats? and Why should you run runstats?


Runstats is a utility. That’s why we say “run runstats”. You may also hear “collect runstats” or “collect statistics”. Other RDBMS’s have a similar concept, but they may call it something slightly different. Runstats collects statistical information about the data in tables and indexes. You can view this information in views in the SYSSTAT schema. SYSCAT tables have some of the data as well.

There are a ton of options on the runstats command. Let’s look at my favorite syntax and what each part means.

db2 runstats on table schema.table with distribution and detailed indexes all

with distribution

This tells DB2 to collect distribution statistics. Distribution statistics include two things:

  • Frequent value statistics – DB2 notes the most frequent values. By default, the 10 most frequent values. Using the above syntax, this is collected for every column. You can change the number of most frequent values using the NUM_FREQVALUES database configuration parameter, or the NUM_FREQVALUES clause on the runstats command
  • Quantile statistics – Divides the values into NUM_QUANTILES (default: 20) sections to describe the distribution of the data. The default means that the optimizer should be able to estimate the number of values that would meet any one-sided predicate within 2.5% of the actual value.
I recommend always collecting distribution statistics for e-commerce databases.

and detailed indexes all

Collecting index statistics helps DB2 decide which if any indexes to use to satisfy a particular query. Collecting detailed index statistics allows db2 to more accurately estimate the page fetches that will be required – allowing db2 to properly estimate the cost of accessing a table through an index. DB2 will use this data along with bufferpool information to determine how much (if any) synchronous page cleaning will have to occur.

I recommend always collecting detailed index statistics for e-commerce databases.

Automated Runstats

DB2 offers a facility for running runstats “as needed”. It looks for a certain percentage of data change and does runstats as needed. But you cannot easily tell it to use the syntax you prefer. That’s one reason that I don’t like to use it. The other is that I’m a control freak, and I want to know exactly when runstats are running (since they can cause mild performance degradation), and that runstats for all tables was done at a specific time.


Why do we gather all of this information? Especially when it can extend the time it takes to run runstats? DB2’s optimizer is very powerful, but its ability to choose the best access path is very heavily dependent on having current statistics. If you call DB2 support on a query performance issue one of the very first questions they will ask is when the last runstats was and if it covered all tables. The same goes for contacting WebSphere Commerce support on database performance issues, if you’re using WebSphere Commerce. I have seen old or missed runstats literally cause a site-down issue that was immediately resolved after runstats/rbind was complete. Give the optimizer the best chance of returning excellent query performance, and do runstats at least weekly for all non-volatile tables.


Info center entry on distribution statistics:

Info center entry on detailed index statistics:

You may also like...

10 Responses

  1. Great article.

    I wanted to say, that if you want to control how the statistics are executed, you must probably use a profile, and then, the automatic maintenance will respect that profile. This is a good reason to activate the automatic runstats and still have the control, by configuring the statistic profiles.
    “Automatic statistics collection respects the profile a user has specified by using the registered profile option in the SYSCAT.TABLES catalog table. If no profile is specified, automatic statistics collection uses the following options when collecting statistics:”

    • Ember Crooks says:

      But I do have to set a profile for each individual table, right? I can see that working, but as a control freak, I also want to know exactly when runstats happened across the board. When someone asks, I want to be able to say “Saturday Morning” or “Last Night” without even thinking about it.

  2. xandra says:


    Do I need to execute runstats for each table? can’t we run by schema?


  3. Amit says:

    What is recommendation for running runstats, say my data changes by x% then I should do runstats? What is the recommended x%, 10%, 20% 30 % etc

    • Ember Crooks says:

      I don’t have that number. I don’t like that method of running runstats, so I don’t know the thresholds associated with it.

  4. Murthy says:

    Is there any tool or SQL stmt to know when to run runstats as we do with REORGCHK.

    • Ember Crooks says:

      Usually if you’re doing runstats manually, then you have a fixed interval that you do runstats for all tables or perhaps categorize tables based on how often you need to do runstats on them based on data change rates (daily, weekly, monthly, etc).

      The driving factor is often data change rates, but I don’t have a specific query to track that. DB2’s automated runstats facilities do make decisions based on data change rates. But I don’t use them because I’m a control freak and I want to know when the last runstats was. Also because the data I’m querying in an e-commerce database tends to be the most recently added data.

      Personally I do runstats on all tables either daily or weekly depending on whether I’m in a build or post-go-live phase where data changes quickly or if I’m just in normal operation. This works well in e-commerce databases. I can see a point being made for different frequencies in different types of databases.

  5. Sanjay Trehan says:

    Few tables have range partitioning with 1 to 5 billion rows. Therefore, there are approximately 450 million rows per partition (with a non-partitioning index and 4 partitioning indexes). Can the optimizer use the statistics of the partition, if available, rather than the table statistics?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>