When Index Scans Attack!
We all know that table scans can be (but aren’t always) a negative thing. I have spent less time worrying about index scans, though. Index access = good, right? I thought I’d share a recent scenario where an index scan was very expensive. Maybe still better than a table scan, but with one index, I reduced the impact of a problem query by 80%.
I’ve recently gotten my hands on the DBI suite of tools http://www.dbisoftware.com/ and am starting to use them to analyze a new set of databases in support of a WebSphere Commerce site that will be going live in a few weeks. I’m using screen shots from those tools in this blog post. I’m not endorsing them explicitly, just showing how I used them.
First, I had a performance problem. SOLR reindexing was taking a long time. The issue was being addressed from multiple directions, but seeing as it spkied CPU on the database server to 80-90 % for 45 minutes, SQL analysis of the time period was in order.
Clearly, I have multiple dragons to slay here. But that first statement is using over 50% of the CPU used during this time period. The text of the statement turns out to be:
SELECT IDENTIFIER, CATGROUP_ID_CHILD, URLKEYWORD, SEQUENCE FROM CATGRPREL T1, SEOURL T2, SEOURLKEYWORD T3, CATGROUP T4 WHERE CATGROUP_ID_CHILD = T4.CATGROUP_ID AND T2.SEOURL_ID = T3.SEOURL_ID AND TOKENVALUE = CATGROUP_ID_CHILD AND LANGUAGE_ID = ? AND T3.STATUS = :ln AND CATGROUP_ID_CHILD IN (SELECT CATGROUP_ID FROM CATGROUP WHERE MARKFORDELETE = :ln) AND CATGROUP_ID_PARENT = ? AND TOKENNAME = :ls AND CATALOG_ID = ? ORDER BY SEQUENCE WITH UR
A side note here – Brother-Pather aggregates multiple statements together for me. Note the
:ls in there – those replace literal values. Multiple statements might show up in my package cache because my application is specifying different literal values for those, or the same literal values may have been used over and over again, too. Parameter markers still show up as question marks, like anywhere else.
Note that I have collapsed some parts of the explain above to focus in on where the bulk of the cost is coming from. Note the place where the expense really comes in is through an index scan. DB2 will tell us “index scan” in an explain plan when it uses the root page to find the intermediate page, and the intermediate page to find the leaf pages, and then fetches from the table by RID. But it will also show “index scan” for a scan where it reads every single leaf page in an index to get the RIDs it needs. I think that’s what was occurring here – page scans of every single leaf page in that index. Even with read-ahead sequential prefetch, the query was still using an awful lot of CPU cycles.
A design advisor on the query, came up with 4 index recommendations (for over 80% improvement), one of which was:
CREATE UNIQUE INDEX "DB2INST1"."IDX1408012049550" ON "WSCOMUSR"."SEOURL" ("TOKENNAME" ASC, "TOKENVALUE" ASC, "SEOURL_ID" ASC) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
This index has exactly the same columns as an existing index, just in a different order. This might make me shy away from it. But I could tell based on the the explain that this was the index I needed for this particular query – it would be the one with the ability to give me the most impact.
What a difference! Just from changing the order of two columns in a three column index.
WebSphere Commerce does not allow me to drop or in any way alter any existing unique index. I have tried before, even just changing a unique index in the include columns and keeping the same index name, but the application fails if I do so. That means that I now have two indexes on this table that cover the same data. Luckily insert/update performance on this particular table is not critical, so I can accept this for now. It is critical to understand which tables have critical insert/update performance in an OLTP database and which tables do not.