Stagingprop – SQL0803 – a rarer cause

I went through the most common cause of SQL0803 back in a previous post. There’s a rarer cause that can look very similar, and I thought I’d detail it.

What this problem looks like:

The following appears in the stagingprop log. The location and name of the stagingprop log can vary, but is most frequently on the application server where the stagingprop is run from,  in /opt/IBM/WebSphere/CommerceServer70/logs (for Commerce 7). The application server where stagingprop runs may be any server where commerce runs and has the databases involved cataloged. Stagingprop may run on a server that is not normally related to either of the database servers involved. Examples are from Commerce 7.

<Error> The SQL batch update failed.
<Error> The command failed to propagate the change related to the STAGLOG record 18310992.
18310992: UPDATE table collateral - collateral_id11676 <scope merchant>
<SQLException> [jcc][t4][102][10040][4.3.111] Non-atomic batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4228, SQLSTATE=null
SQLState: null
Vendor Error Code: -4228
Chained SQLException #1: Error for batch element #2: DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505, SQLERRMC=2;WSCOMUSR.COLLATERAL, DRIVER=4.3.111
Stack trace:
com.ibm.db2.jcc.a.wn: [jcc][t4][102][10040][4.3.111] Non-atomic batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4228, SQLSTATE=null
        at com.ibm.db2.jcc.a.dd.a(dd.java:404)
        at com.ibm.db2.jcc.a.p.a(p.java:352)
        at com.ibm.db2.jcc.a.km.a(km.java:3762)
        at com.ibm.db2.jcc.a.km.b(km.java:3549)
        at com.ibm.db2.jcc.a.km.executeBatch(km.java:2009)
        at com.ibm.commerce.staging.StagingProp.executeUpdate(StagingProp.java:938)
        at com.ibm.commerce.staging.StagingProp.propagateRows(StagingProp.java:813)
        at com.ibm.commerce.staging.StagingProp.transactRows(StagingProp.java:568)
        at com.ibm.commerce.staging.StagingProp.propagate(StagingProp.java:510)
        at com.ibm.commerce.staging.StagingProp.main(StagingProp.java:288)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:600)
        at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:260)
(Wed Feb 16 13:38:44 CST 2011) <Error> The command failed to complete the propagation job.
  

Determining Root Cause:

Based on the error in the diag log, determine which index is being violated. In this case it is index “2” which turns out not to be the primary key, but another unique index on the table. Query the row in both Stage and Prod and determine if it is identical or not:

Staging-role server:

$ db2 "select * from wscomusr.collateral where collateral_id=11676"
COLLATERAL_ID STOREENT_ID COLLTYPE_ID NAME                           URL                                                                                                                                                                                                                                                            FIELD1                                                                                                                                                                                                                                                         FIELD2                                                                                                                                                                                                                                                         OPTCOUNTER UP_NAME
------------- ----------- ----------- ------------------------------ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------------------------
        11676       10651           2 Homepage Overlay Espot 4 UAT                                                                                                                                                                                                                                                                  -                                                                                                                                                                                                                                                              -                                                                                                                                                                                                                                                                       4 HOMEPAGE OVERLAY ESPOT 4 UAT
  1 record(s) selected.

Prod-role server:

$ db2 "select * from wscomusr.collateral where collateral_id=11676"
COLLATERAL_ID STOREENT_ID COLLTYPE_ID NAME                           URL                                                                                                                                                                                                                                                            FIELD1                                                                                                                                                                                                                                                         FIELD2                                                                                                                                                                                                                                                         OPTCOUNTER UP_NAME
------------- ----------- ----------- ------------------------------ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------------------------
        11676       10651           2 Homepage Overlay Espot 3 UAT                                                                                                                                                                                                                                                                  -                                                                                                                                                                                                                                                              -                                                                                                                                                                                                                                                                       4 HOMEPAGE OVERLAY ESPOT 3 UAT
  1 record(s) selected.

Unlike our previous case, the data here is NOT identical. A constraint other than the primary key is being violated. Determine what columns make up that other unique key, and query based on them on prod(in this case, it’s NAME and STOREENT_ID):

$ db2 "select * from wscomusr.collateral where collateral_id=11676"
COLLATERAL_ID STOREENT_ID COLLTYPE_ID NAME                           URL                                                                                                                                                                                                                                                            FIELD1                                                                                                                                                                                                                                                         FIELD2                                                                                                                                                                                                                                                         OPTCOUNTER UP_NAME
------------- ----------- ----------- ------------------------------ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------------------------
        11676       10651           2 Homepage Overlay Espot 3 UAT                                                                                                                                                                                                                                                                  -                                                                                                                                                                                                                                                              -                                                                                                                                                                                                                                                                       4 HOMEPAGE OVERLAY ESPOT 3 UAT
  1 record(s) selected.
 By this we can see that the same NAME exists for a row with a different primary key (collateral_id).
You must also look at the staglog entries related to this row:
$ db2 "select stgrfnbr, stgtable, stgstmp, stgop, stgprocessed from wscomusr.staglog where stgnkey1=11678 and stgtable='collateral' and stgstmp >current timestamp - 2 days with ur"
STGRFNBR             STGTABLE           STGSTMP                    STGOP STGPROCESSED
-------------------- ------------------ -------------------------- ----- ------------
            18310994 collateral         2011-02-16-12.05.56.276106 U                1
            18310996 collateral         2011-02-16-12.06.41.490837 U                1
            18310997 collateral         2011-02-16-12.07.07.344814 U                0
            18311003 collateral         2011-02-16-12.08.07.040047 U                0
            18311029 collateral         2011-02-16-13.28.46.577635 U                0
  5 record(s) selected.

You’ll have to alter the query to look at the time frame appropriate to your issue – in this case it was within the last 2 days. There are two things you’re looking for here. The first is that the row that stagingprop is failing on is an Update (stgop of ‘U’). The issue I’m describing in this particular post only applies to updates. The second is that there are multiple unprocessed updates for the same row – at least two.

What’s really going on: 

Stagingprop does not record the data of each and every update/insert/delete. Instead it records data (in staglog) of the timestamp, type of operation, and unique identifiers for the row effected. At the time stagingprop is run, it queries the database and in the case of updates, updates with the current data of the row – even if there were multiple intermediate sets of data. So picture a scenario like this:

  Row 11676 Row 11678
Original Data A B
Change 1 C D
Change2 B A

In this scenario, stagingprop only sees the final values after change 2, and so it tries to update the value of row 11676 to be B. This fails, because in production 11678 already has the value of B. Stagingprop is processing the changes in what should be the proper order, because the updates to 11676 happened first. Stagingprop does not look at or really understand the data in any way.

Resolution:

The resolution here is to manually update the rows in production to match staging. This is one case where staging is considered the master. You’ll likely have to update the rows using interim values, and then update them using the real values. Most of the time, this is going to require manual update statements. If you’re lucky and it’s a small table with no dependencies, you can export the table on staging and do an import/replace on prod. If you’re really lucky and pre-go-live, you may be able to just restore both databases to before the issue happened.

Why do I prefer the restore approach? Because stagingprop issues rarely travel alone and you can spend days working through some complicated problems one row at a time.

I would like to get IBM’s explanation of why this is desired/designed behavior. Technically I know exactly how and why it happens, but unlike some stagingprop problems, I haven’t seen any documentation directing people to avoid causing it.

You may also like...

4 Responses

  1. Rahul says:

    Hi,

    We are using IBM’s emergency data publish method(Workspace-Task Group), that is ‘Quick Publish’ feature, but the problem we are facing is: The STAGLOG table STGPROCESSED column is not getting marked to value ‘1’ but is ‘0’, and inspite the data gets published to Prod DB from Stage DB. Let me know if you have any idea on this. It is urgent.

    • Ember Crooks says:

      Open a ticket with IBM. I haven’t done a lot with Quick Publish.

      • Rahul says:

        Thanks for prompt reply. Can you provide the link or steps to do the same.

        • Ember Crooks says:

          That information should be provided by IBM. If you have access to your PAO (Passport Advantage Online) set up, then you can do it through there. Otherwise, you have to know your IBM customer number and give them a call. The number is different in different in different countries. In the US, I usually use 1-888-888-5492. If you do not know your customer number, you may first have to call 1-877-426-6006.

Leave a Reply

Your email address will not be published. Required fields are marked *