This is a really complicated topic, largely because dbclean is such a varied tool, and because the data distribution in Commerce databases varries heavily based on how you use it. If you haven’t already, check out my blog entry on dbclean basics.
Also, if you’re having performance issues on a specific cleaning area, search my blog for a post on dbclean for that particualar area – for some I list indexes and other details that may help.
So IBM actually has a really awesome and detailed document on this. Seriously, it’s one of only two on Commerce/DB2 topics that I really recommend.
It’s actually a really detailed and excellent discussion on the topic, and I’ll be adding it to my list of links.
There’s nothing magical about how Commerce handles data. The same techniques that apply to other DB2 databases apply to Commerce databases. Commerce databases do halfway decent on indexing. But Commerce is such a versatile tool that often indexes can be added to help. Don’t be afraid to use Explain and the Index Advisor (design advisor, whatever they’ve decided to call it now) to find indexes that will help the deletes. Also, don’t be afraid to dig into the deletes and understand if certain conditions can be changed or eliminated for your environment - the best times I’ve seen come from such alterations if they fit the data. WebSphere Commerce does support adding your own indexes – just pick a different naming standard and use common sense when adding indexes.
DBClean allows you to change the syntax used for deletes or to add additional pruning areas. If you do find a syntax that works better for you than what’s in DBClean, you should make a whole new entry in CLEANCONF rather than altering a stock entry. The statements in CLEANCONF may change with Fix or Feature Packs, and that could over-write any changes that you had made.
Interim commits should be part of your pruning strategy (to keep from filling up the transaction log and crashing the whole database). But those interim commits do hurt performance some. So make sure you’re not using a crazy value like 5. I like 500 for member/user deletes and 1000 for others.
If I had to guess on why someone was reading a post like this, the guess would invariably be Users – either Guest or Registered. My off-the cuff guess for how fast I can delete users or members is about 150 per minute(varies significantly with db size, data distribution, and hardware). That’s right, just 150 rows per minute. But the main reason is the Referential Integrity – that one delete cascades to over 300 tables. So that’s really 45,000 rows in different tables per minute. I ran an explain of a single-row delete on MEMBER on a non-production Commerce 6 database that is less than 15 GB in size. I couldn’t paste it in because it was too big. It shows that it is considering 831 constraints on delete. An index advisor shows around 300 recommended indexes with a very negligible improvement.
So what can you really do? Please, don’t add 300 indexes blindly.
- If you’re working with Commerce 6, please apply FixPack 18 of DB2 8 – there are actually some fixes on how db2 looks at FK’s that really help performance
- For user deletions – If your site doesn’t take orders, please eliminate or change the subquery that looks for orders in cleanclonf. This makes a huge difference, I can say from personal experience.
- Consider an offline one-time purge. (stop commerce, export the data you want to keep, load with the replace option and script a bunch of set integrity statements) If you’ve gotten too far behind, this may be your only choice (other than running dbclean 24/7 for weeks). It is really risky, though – mainly in that you might delete something that should have been kept. This approach should only be considered if you have a test environment where you can do a full-scale test(likely multiple times).
- Add indexes selectively. This does not mean to do whatever the index advisor says. If you have a large number of indexes recommended, look at each one and the tables involved, including their cardinality to see what really makes sense.
- Don’t forget the basics – runstats (with distribution and detailed indexes all) on all non-volatile tables and reorgs as recommended by reorchk – at least within the last week.