DB2 LUW Indexes: B-Tree Details, Physical Storage and More
I had an earlier blog entry on the basics of Indexes, but there are so many details to cover.
Indexes are Physical Objects
An index is an object in an IBM DB2 LUW database that takes up space on disk. Much like a table, it contains actual data, though index data is a partial copy of the data in one table. The reason indexes exist is to speed access to the data in tables. In contrast, views may be in place either to improve performance or to manage security issues, but they are not manifested on disk. Sometimes queries can be fulfilled through only accessing an index without going to the table. This is called index-only access.
By definition, Indexes represent a duplication of data that already exists in the table.
Different relational database management systems (RDBMSes such as Oracle or MySQL) handle indexes differently. So the information in this blog entry is specific to IBM DB2 for Linux, Unix, and Windows.
There is only one type of index that IBM DB2 for LUW supports – B-Tree indexes. (Ok, block indexes for MDC tables, but that’s beyond the scope of this blog entry).
B-Tree indexes are typically represented conceptually by three levels, as in the diagram below, though there can be additional levels of intermediate nodes with the higher levels pointing to lower levels of intermediate nodes.
In this diagram, DB2 would start at the root page. “Page” and “Node” are interchangeable terms for IBM DB2 LUW. Let’s work through this as if DB2 were looking for the value of ‘E’ in this index on a single character. IBM DB2 would first look at the root node, and look at each value. Essentially, it would ask:
Is E before or equal to C?
Is E before or equal to J?
Yes – go to that intermediate page.
The process then happens again for values on the intermediate page.
Is E before or equal to D?
Is E before or equal to F?
Yes – go to that leaf page.
Once IBM DB2 has identified the right leaf page, it will find the right value on the page and get the RID or Record IDentifier. The RID tells db2 the exact location (page and slot number) in the table/tablespace where it can find that particular record. Many times once IBM DB2 has that RID, it will go get additional data from the data page in the table.
A RID consists of a page number and a slot number on that page. For tables in SMS tablespaces, the RID is relative to the table. For tables in DMS or AST tablespaces, the RID is relative to the tablespace.
In the scenario we have described above, given a unique value in an index, IBM DB2 will use exactly four page reads (which may be entirely in memory) to get a specific row. This is much more efficient than scanning through hundreds or thousands or millions of pages in the table to find the needed record.
The index can have more than one column. In that case, DB2 would follow this process for each column in order – first identifying the place where the values in the first column exist, and then identifying where the values in the second column exist and so forth. The image below represents a b-tree index with multiple columns.
B-Tree Index with Two Columns
For this one there are two columns – first one on a singe character, and then one with a single number that can have the value of either 1, 2, or 3. If we were looking for the values of F and 3, we would start at the root page, and ask:
Is F before or equal to C?
Is F before or equal to J?
Yes – go to that intermediate page.
Is F before or equal to D?
Is F before or equal to F?
Is 3 before or equal to 2?
No, move to the next listing on the intermediate page.
Is F before or equal to J?
Yes, go to that leaf page, and get the RID.
The above shows indexes with only one entry for each index key, or at least a very small number. There may also be indexes with a lower cardinality – with many entries for each index key. Index cardinality is the number of unique index key values. The cardinality of an index may be anywhere from 1 to the number of rows in the table. Unique indexes have a cardinality equal to the cardinality of the table.
If you are looking at an explain plan of access to table using an index, it would look something like this, given a table name of WSCOMUSR.MBRATTRVAL and an index name of WSCOMUSR.I0000327:
Total Cost: 298.134 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 7309.97 FETCH ( 2) 298.134 152.851 /---+----\ 7309.97 6.90035e+06 IXSCAN TABLE: WSCOMUSR ( 3) MBRATTRVAL 54.7399 Q1 6.86993 | 6.90035e+06 INDEX: WSCOMUSR I0000327 Q1
Indexes can also be accessed using an index leaf scan. An index leaf scan is where every leaf page in the index is scanned in order, much like a table scan. These are a bit hard to find because IBM DB2 refers to any index access as an “IXSCAN”, even if it is just direct access as in the example above. Index leaf scans occur when most or all of the needed data is in the index, but one or more of the columns earlier in the index are not specified or used in the query.
Column Order Matters
Above, I talked about how DB2 uses the index, looking for values for the first column first, and the second column second, and so forth. This means that when you are trying to find appropriate indexes, the order of the columns in the index matters very much. If a query specifies a value for the first column in the index and the third column in and index, but not the 2nd column, then that index will not be very useful for that query prior to DB2 10. In DB2 10, the concept of “Jump Scans” was introduced, so it is now possible in some cases for DB2 to still use the index, but only in certain scenarios.
Indexes may Help Select Performance, May Hurt Insert/Update/Delete Performance
Indexes can be great for some Select performance, but every index added negatively impacts Insert/Update/Delete performance. That is because for each record in an Insert/Update/Delete, DB2 must also go out and handle the record in every index. Above we saw that with just three page reads, we could locate an individual row. Those three reads and associated writes must also occur for each Insert/Update/Delete. That means that with standard three-level b-tree indexes and 10 indexes on a table, in addition to locating and altering the proper page in the table for a particular record, DB2 must also touch 30 pages of index. It doesn’t take long for the maintenance of the duplicate data that indexes are to become problematic.
For non-unique indexes, each index key is only stored on each leaf page once. This is saves space. The maximum rid for each leaf page is also stored on the intermediate pages, to make access to a specific RID as efficient as possible. The RID in these cases is treated much like an additional column in the index, or keypart. Below is what our b-tree diagram looks like including that information:
Non-Unique B-Tree Index
On index creation, LOAD, and REORG the parameter of PCTFREE is respected when the pages are filled with data. The default if nothing is specified is 10 percent. What PCTFREE does is leave open space on each leaf page (LEVEL2 PCTFREE is also available for intermediate nodes) of an index for additional entries to be added. Performance of inserts is likely to be better if there is enough space on the right page to add an index entry where DB2 needs to. If there is not enough space on the right page, then DB2 will have to perform a page split (defined below), which requires another page read and write.
I like to think about the geometry of my index leaf pages. For example, the maximum index key size for a 4K tablespace is 1024. With an index key size that large, only 3 rows will fit on an index leaf page (since there is a small amount of page overhead as well). In that scenario, a PCTFREE of 10% would be about 410 bytes, which would not be enough room for an additional row to fit on the page. It is therefore a useless setting in that case, and you might want to consider a larger page size for the indexes for that table. There is no requirement for the indexes and the table to have the same page size as the table.
Because many larger indexes may include VARCHAR columns, the actual size of index keys tends to be rather small in many of my databases. Here’s a query you can use to find indexes where the average index key size is more than 10% of the page size:
select substr(i.indname, 1, 25) as indname, substr(i.tabname,1,20) as tabname, avgleafkeysize, pagesize from syscat.indexes i, syscat.tablespaces t where i.TBSPACEID=t.TBSPACEID and avgleafkeysize > .1 * pagesize with ur;
I’d love to post the results of this query, but I don’t have any indexes that meet those parameters. Keep in mind that the average index key size may be very different from the maximum index key size, so the results here could change over time.
MINPCTUSED can help reduce the need for reorgs, however it can also slow down updates and deletes, and is only really used if there is an exclusive table lock along with the update or delete. That makes it something I don’t use in my e-commerce databases, but it may be useful in some situations. When MINPCTUSED is specified, DB2 will attempt to merge a leaf page with a neighboring leaf page when the page is less full than the percentage specified for MINPCTUSED. Logically, you would probably not want to set this to anything over 50%
When there is no space available on a leaf page for an insert or an update, then DB2 must add in another page. By default, DB2 will split the page into two equal groups of RIDs – this is called a symmetric page split. You can, however specify “HIGH” or “LOW” for this parameter. When you specify “HIGH”, better performance is seen for ever-increasing values of the index key (such as you may have for a generated key like order number or like many of the _ID fields in WebSphere Commerce databases). If you specify “LOW”, then you get better performance for ever-decreasing index keys.
Since indexes are actually represented on disk, they take up disk space. I’ve supported databases where the indexes were larger than the table data in total size. If you index everything, then your indexes will be larger than your table – so you have to be cautious and consider carefully the impact any new index will have on disk space.
After you add an index, you must perform runstats on the table/index before the index will be used. On the create index command, you can use the keywords “COLLECT DETAILED STATISTICS” to collect statistics as the index is built. You can also simply do runstats on the table after creating the index.
A Word on General Indexing Strategies
This is not a post on how to find appropriate indexes. That is incredibly complicated and an art that generally requires a fair amount of experience to do well. In general, I’m a fan of the strategy of finding appropriate composite (multi-column) indexes. Individual indexes on separate columns don’t tend to perform as well. In my e-commerce databases, I don’t get a lot of index-only access, either – that is only appropriate for specific scenarios.
With any added index, there is a small risk of performance degradation for selects. It is possible that adding an index will cause DB2 to change an access plan for some other query and that change will actually be worse. The risk is pretty low, though, and it is extremely easy to back out an index addition by simply dropping the index and re-doing runstats.
Other Index Topics
There are plenty of other Index details that I’m likely to cover in future posts. This is a very large topic, and there are plenty of things to talk about.