DBClean – Junk OrderItems

So I’m starting out my series on DBClean and data pruning items with the one that I have seen cause the most immediate severe performance impact. I’ve seen this one take a site to its knees within two months of go-live.

Junk OrderItems

What is it?

When users delete items from their carts, most often the items go into a Junk status. They are moved to a single Junk Order within the database – so if you haven’t pruned this area, that order may have thousands of orderitems associated with it. There are two reasons that Commerce functions this way instead of just immediately deleting orderitems:

  1. The OrderTrashCan feature is enabled. This allows users to have a trash can and visit that trash can to move items from the trash can back into their cart.
  2. It is faster to just mark the orderitem as junk than to require the user to wait for the database to perform a delete. This is a common strategy for OLTP databases – to mark an item for delete and then do the actual delete asynchronously.

I actually find that #1 is pretty rare.

Potential for impact

Very High – if you only prune one thing, let this be it.

Where this Applies

Any Commerce database that has items that meet the criteria in the SQL below.

Table (other tables may be involved in cascading deletes)

ORDERITEMS

SQL

note: always test for yourself, don’t take my word for it

note also that you should be cautious running the SQL during peak loads, and always use “with ur”

Number of rows to be deleted:

select count(*) from orderitems where status ='J' and lastupdate < current timestamp -  days with ur

Pruning method

Ha, the first one I cover and it’s not even one you can prune with DBCLEAN. Please send me how in the comments if you disagree.

This can be pruned manually, just make sure you’re breaking your deletes into small batches to avoid filling the transaction log. You don’t want to run it as one statement, but it would be:

delete from orderitems where status='J' and lastupdate < current timestamp –  days

Determining Retention

Retention times on this tend to be short. If the OrderTrashCan feature is enabled, you’ll want to consider how long you want customers to be able to move items from their trash cans into their carts. If OrderTrashCan is not being used, a couple of days is probably sufficient to allow for any troubleshooting that may require this data. Assuming you’re not using the OrderTrashCan feature, you may not even have to discuss the retention period with the business or the developers – however you may want to discuss with your developers if retaining these orderitems is desired behavior.

Indexes to help deletion

You may want to add an index to the orderitems table on status, lastupdate, and (if you’re using it to split up deletes) orderitems id

A little deletion tip

To break up deletes, you can use this method, though it may not be the fastest. First you must have a unique numerical column. In this case, orderitems_id works well, and many of Commerce’s generated primary keys can serve well. Then you also have to decide how many pieces to break the delete up into. I aim for chunks of somewhere between 500 and 50,000 depending on the circumstances. For the case of this example, we’ll say we’re breaking it up into 5 chunks. Add the use of the mod function on to the end of your delete statement. So this:

delete from orderitems where status='J' and lastupdate < current timestamp –  days

becomes:

delete from orderitems where status='J' and lastupdate < current timestamp –  days and mod(orderitems_id,5) = 0
delete from orderitems where status='J' and lastupdate < current timestamp –  days and mod(orderitems_id,5) = 1
delete from orderitems where status='J' and lastupdate < current timestamp –  days and mod(orderitems_id,5) = 2
delete from orderitems where status='J' and lastupdate < current timestamp –  days and mod(orderitems_id,5) = 3 
delete from orderitems where status='J' and lastupdate < current timestamp –  days and mod(orderitems_id,5) = 4

If you end up with a large number of statements to write, the ‘concatenate’ function in Excel can be very useful to write them quickly. Of course you’ll want to execute the above with autocommit on so that commits happen after every statement.

A slower method but one that assures the highest level of concurrency is to generate a list of the primary keys of the rows you want to delete and then write a script (ksh, perl, whatever) that loops through them and issues a delete and a commit for each row. In this example, I could do something like:

db2 -x "select orderitems_id from orderitems where status ='J' and lastupdate < current timestamp -  days with ur" >oi.list
cat oi.list |while read o; do db2 connect to ; db2 -x "delete from ORDERITEMS where orderitems_id=$o"; db2 connect reset; done >deletes.out

So, good information? More detail needed in any areas? Let me know what you think.

You may also like...

8 Responses

  1. Mikael Hase says:

    Hi,

    Wouldn’t dbclean with object order and type markfordelete accomplish this? The SQL statement is
    delete from orderitems where orders_id in (select orders_id from orders where type=’BIN’ and status=’J’)

    Running “select (1) from orderitems where orders_id in (select orders_id from orders where type=’BIN’ and status=’J’)” yields the same result on my system as your count(*) statement.

    Best,

    Mikael

  2. Ankur says:

    Hi,
    I am thinking on scheduling the dbclean script on my ecommerce setup. Currently I have accomplished this using the Crontab and shell script but is there any way to schedule it using the scheduler in WebSphere.

  3. rak says:

    Hi,

    Are you aware of a default list of WCS v7 tables that are included in the dbclean.sh utility? I read its not included in the dev toolkit, but surely someone must have it online?

    I cant find it anywhere.

    PS: Great info you have here on your site!

    • Ember Crooks says:

      You have to invoke dbclean to specify the objects you want to delete. There is no default list of things that are cleaned for you. There are default delete statements for dozens of different areas that you can run dbclean for in the CLEANCONF table in your Commerce database. Many consultants have a list of default cleaning areas that they suggest starting with, and others simply depend on how the Commerce database is being used.

  1. April 7, 2011

    […] Pruning junk ORDERITEMS is also one of the more critical data pruning areas. […]

  2. September 15, 2011

    […] Junk Orderitems (see DBCLEAN – Junk Orderitems) […]

Leave a Reply

Your email address will not be published. Required fields are marked *