Example of A Clustering Low-Cardinality Index Helping Query Performance
The request from the developers was something along the lines of “Help, Ember, this query needs to perform better”. Sometimes the query I’m working on is not one that shows up as a problem from the database administrator’s perspective, but one that is especially important in some part of application functioning. In this case, this query is related to the performance of searches done on the website – a particularly problematic area on this client.
The query I was asked to help with is this one:
select distinct ADSC.SRCHFIELDNAME, AVD.value, AV.storeent_id, AVD.sequence, AVD.image1, AVD.image2 from attrdictsrchconf ADSC, attr A, attrval AV, attrvaldesc AVD where ADSC.ATTR_ID is not NULL and ADSC.attr_id = A.attr_id and ADSC.mastercatalog_id = ? and ADSC.srchfieldname in (?) and A.storeent_id in (?, ?, ?) and A.facetable = 1 and AV.attr_id = A.attr_id and AV.storeent_id in (?, ?, ?) and AV.attrval_id = AVD.attrval_id and AVD.language_id = ?
The data model in question here is a typical WebSphere Commerce database, with no customizations involved in this query.
A typical db2advis gives me nothing useful, but I know better than to stop there. The thing I tried first was to do an explain that recommends MQTs and MDC tables. I’ve talked to WebSphere Commerce support on this and they’ve told me that MDC Tables are considered a customization. They don’t see why it wouldn’t work, but they won’t say it’s supported. C’mon IBM – there are several areas I can see MDC’s making big performance gains for me. But even if I can’t use MDC tables, the db2 advisor for them can sometimes give me ideas for clustering indexes, which I can use.
If doing this at the command line, you would use the ‘-m C’ option on db2advis to get the same kinds of recommendations.
In this case, it recommended one clustering index and that one table be converted to an MDC table. But the MDC recommendation includes only one dimension. This makes it particularly likely that a clustering index could help me nearly as much. Both index recommendations are duplications of existing indexes, but just adding clustering. From what I’ve seen, the WebSphere Commerce data model seems to favor a lot of indexes on single columns. This leads to some low-cardinality indexes. If you want to understand why low-cardinality indexes can hurt performance, please see my DeveloperWorks article, Why low cardinality indexes negatively impact performance.
Looking at the index in question, it is clearly a low-cardinality index:
select substr(indschema,1,12) as indschema, substr(indname,1,12) as indname, fullkeycard from syscat.indexes where indname='I0001468' with ur INDSCHEMA INDNAME FULLKEYCARD ------------ ------------ -------------------- WSCOMUSR I0001468 2 1 record(s) selected.
Hard to get much more low-cardinality than 2. The cardinality of this full table is 1,255,954. The data is relatively evenly distributed across these two values. DB2 is (correctly) choosing not to use the index because if it’s low cardinality and the table is not clustered on it, it would make performance worse.
I changed the index to be a clustering index like this:
$ db2 "drop index wscomusr.I0001468" DB20000I The SQL command completed successfully. $ db2 "create index wscomusr.i_attrvaldesc_01 on wscomusr.attrvaldesc (language_id) cluster allow reverse scans" < DB20000I The SQL command completed successfully. $ db2 "REORG TABLE WSCOMUSR.ATTRVALDESC INDEX wscomusr.i_attrvaldesc_01 " DB20000I The REORG command completed successfully. $ db2 "RUNSTATS ON TABLE WSCOMUSR.ATTRVALDESC with distribution and detailed indexes all" DB20000I The RUNSTATS command completed successfully.
Note that it's now using my clustering index for that table, even though it can't make it into index-only access. And the cost of that query is reduced by a whopping 41%. I got the cost of the query down a bit more with another clustering index, and the developers were quite happy. They asked if I could reduce it further and my response was "Well, can you stop using DISTINCT and IN?"
- Information beyond the standard db2advis can be useful even if you cannot make use of all DB2 features based on vendor restrictions.
- C'mon IBM - you're the vendor of both the application and the database. Can we get the use of MDC tables for base WebSphere Commerce tables certified?
- I am a lifelong GUI-hater, but I'm loving DBI's Brother-Panther for looking at things.
I somewhat doubt that I would have added this index by itself if it wasn't replacing a base WebSphere Commerce index with the exact same columns. Maybe if I happened on this same performance improvement. I might have experimented more with a composite index in that case. But I like sticking as close to the WebSphere Commerce data model as possible unless I have very specific reasons to do otherwise.