Improving Performance of DBCLEAN Deletes
While this post is specific to WebSphere Commerce, it covers concepts that may apply to tuning other delete statements as well.
Using Functions in the Best Place
The CLEANCONF table stores the delete statements that are used by DBCLEAN. Most of them use syntax like this for the date component:
(days(CURRENT TIMESTAMP) - days(prevlastsession)) >= ?
Every time I see this, I cringe.
Applying a function (such as days) to a column/field eliminates the possible use of an index on that column. This essentially means that this SQL is forcing a table scan – sometimes of very large tables like USERS, MEMBER, or ORDER.
This SQL can be very easily re-written to not use a function on table data. The following is only marginally different from the above, and it can significantly improve performance:
prevlastsession < current timestamp - ? days
Now technically, this will also delete things to the middle of a day, so it is slightly different. To mitigate any differences, just keep one more day.
It is supported to make changes to the base Commerce DBCLEAN jobs, though it is a bit safer to create a custom DBCLEAN job with different SQL. Make sure customizations survive WebSphere Commerce fixpacks, feature packs, and upgrades.
Once the SQL is changed like this, one or more indexes may drastically improve performance on the parent table the delete is running against.
Recently, I had the chance to spend some time tuning a WebSphere Commerce environment that I do not regularly support, and one of the areas they wanted the most help with was tuning DBCLEAN statements. One of the statements we started with was a fairly standard one for deleting guest users:
delete from member where member_id in ( select users_id from users T1 where registertype='G' and (days(CURRENT TIMESTAMP) - days(prevlastsession)) >= ? and not Exists ( select 1 from orders where orders.member_id=T1.users_id and status != 'Q') and (users_id > 0))
Explaining this statement, I discovered that in this environment, this delete statement was estimated at 7,261,900,000 timerons. These deletes are very expensive because they cascade to hundreds of tables. The explain plan had over 4,000 operators. This is how it reads with the altered date syntax:
delete from member where member_id in ( select users_id from users T1 where registertype='G' and prevlastsession < current timestamp - ? days and not Exists ( select 1 from orders where orders.member_id=T1.users_id and status != 'Q') and (users_id > 0))
With just that simple change, the estimated timerons went down to 6,990,170,000. That still seems like a lot, but that's an improvement of over 250 MILLION timerons. Even better, the design advisor now tells me I can get very significant improvements by adding indexes.
Conventional wisdom is that indexes slow down deletes. That's true in the sense that the more indexes there are, the more physical places the delete has to happen. However when a delete has significant conditions on it like these do, indexes to support the conditions may make sense.
Using the Design Advisor on a Delete
Running the design advisor on the deletes on MEMBER or ORDERS will take a while to run. Running it on the guest users delete above comes up with over 300 recommended NEW indexes. THREE HUNDRED. There is no way you want to add 300 indexes based on the design advisor alone. So how do you navigate all those indexes?
- Look at the explain plan - maybe there will be something obvious, though the plan is very hard to look at as it is so large
- Make your best guesses based on table cardinality, custom tables, and other likely factors
- Use a modified version of Multiple Index Regression Analysis
Here is how I approached it when looking at the guest users query above. First, I made some guesses. I looked first at indexes on custom tables. When I did that, I found that the indexes on likely-looking custom tables, if added alone, actually made my delete performance significantly worse. I then found a branch of the explain plan that seemed particularly expensive, looked for indexes in the design advisor output on those tables. Also wrong - again, they actually made the delete performance worse.
I then decided to order the recommended indexes by name, and use the same techniques in the Multiple Index Regression Analysis to look at groups of indexes. My query started out at 6.9 billion timerons. If I added all 300 indexes that the advisor recommended, it suggested I could get it down to 334 million timerons (a 95% improvement). I used the EVALUATE INDEXES explain mode to look at the first 160 indexes and see if how much they would help. If all 160 were added, I could expect the delete to run at a cost of only 184 million timerons!! That's even less than my end goal. Somewhere in the first 160 indexes are some real gems. For my next run using EVALUATE INDEXES, I tried the first 40 indexes, and found again an estimated cost of 180 million timerons. Trying both the first 10 and the first 20 got me 482 million timerons. Starting to work through the first 10, I found that the very first index when they were sorted by name would get me down to 582 million timerons. That's a 12X performance improvement by adding just one index, and it was a very reasonable, high-cardinality, non-duplicate index on USERS (+REGISTERTYPE+PREVLASTSESSION+USERS_ID). This is an index that would not have helped if I hadn't changed the date syntax. Here's a partial look at what my spreadsheet looked like:
I decided to add this index and have the customer evaluate performance to see if it meets their performance goals. I may go back to find more good indexes, but given the effort of finding them, the question is if an additional bit more will be worth it.
One of the most common areas where indexing is needed in WebSphere Commerce databases is on custom tables. Some custom tables can be particularly large, and may have foreign keys defined to tables that are part of DBCLEAN. These foreign keys must be defined with cascading deletes (if not, they will cause DBCLEAN to fail), and there may not be indexes in support of the foreign keys.
One client recently had trouble with a delete from ORDERS. They were unable to delete 10,000 orders in 2 hours. We used the date improvement described above for an estimated 99% improvement in performance, but that did not solve the issue. What did solve the issue was to add an index on a single custom table that included an ORDERSID column. There was a foreign key on this column, referencing ORDERS_ID in the base Commerce table ORDERS, but there was no index to support the foreign key, and the table had millions of rows. Immediately after adding the index on this single column, deletes completed much faster, with more than 100,000 being able to be completed in the two hour window that previously would not accommodate 10,000.