Troubleshooting Table with Failed Load (SQL0668N)

I recently had a client with a table in an inconsistent state. Apparently a load failed, which is common enough, but they then tried to do a reorg, thinking that would solve it, so I first had to identify and force off connections causing a lock chain before solving the root problem.

Environment and Complications

There were a couple of complications due to the client being on 9.5 FixPack 1. It was a small data warehouse (1.6 TB on 1 catalog and 4 data nodes). LOCKTIMEOUT is set to -1. The troubleshooting and resolution would have been  significantly easier on a newer version of DB2. There were two complicating factors with the old environment:
  1. The db2top utility had not yet been incorporated into DB2 on this version, so I had manually installed it, and occasionally it doesn’t function quite right. At the time of this issue, it refused to show me the sessions screen without crashing.
  2. A host of table functions I use to administer DB2 were introduced in db2 9.7, and are not available in 9.5, and there is no way to back-port them. I normally would have made use of either MONREPORT or MON_GET functions or both.

Clearing up Lock Chain

The first thing I did was to clear up the reorg and other connections. As the instance owner, I made my putty session as large on my screen as possible. Next, I looked at the connections in db2top by issuing:
db2top -d sample
(where sample is the database name)and then pressing U to get the locking screen. I don’t have a screen shot of how that screen looked, but it showed me multiple connections in a red “lock waiting” state. By pressing L, I got the lock chain. While not from this specific issue, the lock chain screen looks something like this:
db2top_16
The main thing I’m getting from that screen are the three applicaiton ids that were involved in lock chains. Next, I use one of two methods to determine which application was the reorg and force off the other two.
  • Method 1: From the locks screen in db2top, press A. It will prompt you for an agent id. Enter one of the agent ids and see if you can find where it lists an sql statement or former action. It may or may not give you this info in an easy to read format. This screen is the reason I make my putty window as large as possible – there is a lot of data to display
  • Method 2: Exit out of db2top by pressing q or ctrl+c. At the command line, issue:
db2 get snapshot for applications on sample > appsnap.out 

eral screens of text for each one to see if there is a reference to a reorg or to a script being run. Pay attention to the authorization IDs being used and the idle time. Alternately, search the file on “eorg” to find applications that have recently done reorgs. I copy about the first screen of this output when giving a client details about a connection I’d like to force, as it includes things like where the connection is coming from, the authorization id, and the idle time.

Using method 2 above, I identified which two of the three connections were not the reorg and forced them off (after verifying through the appsnap.out that they had been idle for a while – I hate to force an active or recently active connection).

Identifying and Clearing Table Problem

The reorg then completed or failed immediately. I attempted a simple select on the table and noted the exact error message and return code. This is what I got:

$ db2 "select * from cis_ods.cs_pgm fetch first 1 row only with ur"

CS_ID   PGM_TYP_CD EFF_BGN_DT APP_ID      EMR_REQ_SW PGM_REQ_DT PGM_STS_CD STS_RSN_CD STS_DT     RESC_RSN_CD RESC_DT    CSLD_ID     CSLD_XFR_DT CSLD_XFR_RSN_CD PTNTL_ICT_DISCN_EFF_DT ACTN_DT    EFF_END_DT CRT_USR_ID      CRT_DTM                    UPD_USR_ID      UPD_DTM                    HIST_IND LAST_RTRV_DT CNTY_SPFC_CS_TYP_IND BEN_SAFE_ARMS_NWBN_SW MCL_PROP_DSRG_EVA_SW TYP_OF_HH_IND AID_COD CSH_REACTVT_DT APP_CMP_SW APP_CMP_DT APP_CMP_OVR_SW EX_PARTE_CNTCT_DT EX_PARTE_CNTCT_CD FS_REACTVT_DT DISCV_RSN_CD DISCV_DT   FS_APP_TM_LIM_RSN_CD ACA_TRNST_DT ACA_TRNST_OVR_SW INGEST_DATE
------- ---------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------- ----------- --------------- ---------------------- ---------- ---------- --------------- -------------------------- --------------- -------------------------- -------- ------------ -------------------- --------------------- -------------------- ------------- ------- -------------- ---------- ---------- -------------- ----------------- ----------------- ------------- ------------ ---------- -------------------- ------------ ---------------- --------------------------
SQL0668N  Operation not allowed for reason code "3" on table "CIS_ODS.CS_PGM".
SQLSTATE=57016
I then looked up that error message so I could understand what reason code 3 meant:
$ db2 "select * from cis_ods.cs_pgm fetch first 1 row only with ur"

CS_ID   PGM_TYP_CD EFF_BGN_DT APP_ID      EMR_REQ_SW PGM_REQ_DT PGM_STS_CD STS_RSN_CD STS_DT     RESC_RSN_CD RESC_DT    CSLD_ID     CSLD_XFR_DT CSLD_XFR_RSN_CD PTNTL_ICT_DISCN_EFF_DT ACTN_DT    EFF_END_DT CRT_USR_ID      CRT_DTM                    UPD_USR_ID      UPD_DTM                    HIST_IND LAST_RTRV_DT CNTY_SPFC_CS_TYP_IND BEN_SAFE_ARMS_NWBN_SW MCL_PROP_DSRG_EVA_SW TYP_OF_HH_IND AID_COD CSH_REACTVT_DT APP_CMP_SW APP_CMP_DT APP_CMP_OVR_SW EX_PARTE_CNTCT_DT EX_PARTE_CNTCT_CD FS_REACTVT_DT DISCV_RSN_CD DISCV_DT   FS_APP_TM_LIM_RSN_CD ACA_TRNST_DT ACA_TRNST_OVR_SW INGEST_DATE
------- ---------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------- ----------- --------------- ---------------------- ---------- ---------- --------------- -------------------------- --------------- -------------------------- -------- ------------ -------------------- --------------------- -------------------- ------------- ------- -------------- ---------- ---------- -------------- ----------------- ----------------- ------------- ------------ ---------- -------------------- ------------ ---------------- --------------------------
SQL0668N  Operation not allowed for reason code "3" on table "CIS_ODS.CS_PGM".
SQLSTATE=57016

I then looked up that error message so I could understand what reason code 3 meant:

$  db2 ? SQL0668N


SQL0668N  Operation not allowed for reason code "reason-code" on table
      "table-name".

Explanation:

Access to table "table-name" is restricted. The cause is based on the
following reason codes "reason-code":

1        The table is in the Set Integrity Pending No Access state. The
         integrity of the table is not enforced and the content of the
         table may be invalid. An operation on a parent table or an
         underlying table that is not in the Set Integrity Pending No
         Access state may also receive this error if a dependent table
         is in the Set Integrity Pending No Access state.

2        The table is in the No Data Movement state. When in this state,
         operations that cause data movement are disallowed. Data
         movement operations include REDISTRIBUTE, update of database
         partitioning key, update of multi-dimensional clustering key,
         update of range clustering key, update of data partitioning key
         and REORG TABLE.

3        The table is in the Load Pending state. A previous LOAD attempt
         on this table resulted in failure. No access to the table is
         allowed until the LOAD operation is restarted or terminated.

4        The table is the Read Access state. This state can occur during
         on-line LOAD processing (LOAD INSERT with the READ ACCESS
         option), or after an on-line LOAD operation, but before all
         constraints have been validated in the newly appended portion
         of the table using the SET INTEGRITY statement. No update
         activity is allowed on this table.

5        The table is in the Load In Progress state. The LOAD utility is
         currently operating on this table, no access is allowed until
         the LOAD is finished.

6        Materialized query tables that reference a nickname cannot be
         refreshed in ESE.

7        The table is in the reorg pending state. This can occur after
         an ALTER TABLE statement containing a REORG-recommended
         operation.

8        The table is in the alter pending state. This can occur when
         using the table in the same unit of work as an ALTER TABLE
         statement containing a REORG-recommended operation.

9        The table is in Redistribute Pending state. The REDISTRIBUTE
         utility is not completed on this table, no access is allowed
         until the REDISTRIBUTE is finished.

User response:

1        Execute the SET INTEGRITY statement with the IMMEDIATE CHECKED
         option on table "table-name" to bring the table out of the
         Set Integrity Pending No Access state. For a user maintained
         materialized query table, execute the statement with the
         IMMEDIATE UNCHECKED option instead of the IMMEDIATE CHECKED
         option.

2        Execute REFRESH TABLE statement on the dependent immediate
         materialized query tables and staging tables of table
         "table-name". The contents of these dependent immediate
         materialized query tables and staging tables can be
         incrementally maintained from the appended data of
         "table-name" through previous LOAD INSERT operations and from
         the attached data of "table-name" through previous ALTER
         TABLE statements with the ATTACH clause.

3        Restart or terminate the previously failed LOAD operation on
         this table by issuing LOAD with the RESTART or TERMINATE option
         respectively.

4        Issue the LOAD QUERY command to check whether the table is in
         the process of being loaded. If yes, wait until the LOAD
         utility has completed, or if necessary, restart or terminate
         previously failed LOAD operation. If LOAD is currently not in
         progress, issue the SET INTEGRITY statement with the IMMEDIATE
         CHECKED option, to validate constraints in the newly loaded
         portion of the table.

5        Wait until the current LOAD operation has finished. You can use
         the LOAD QUERY command to monitor the progress of load.

6        Define a materialized query table using the MAINTAIN BY USER
         option. Then, use an INSERT statement with a subquery to
         populate the materialized query table.

7        Reorganize the table using the REORG TABLE command (note that
         INPLACE REORG TABLE is not allowed for a table that is in the
         reorg pending state).

8        Complete the unit of work, and re-issue the command.

9        If the REDISTRIBUTE utility is working, wait until it finishes
         working on the current table. You can use the LIST UTILITIES
         command to monitor the progress of the REDISTRIBUTE utility. If
         a previous REDISTRIBUTE operation failed and left the table in
         this state, issue the REDISTRIBUTE utility again with the
         CONTINUE or ABORT option and let it finish on this table.

 sqlcode: -668

 sqlstate: 57007
This was clearly telling me that this is not a reorg issue, but a table in a load pending state. I then tried a load terminate command that the client mentioned they had used in the past, and got this:
$ db2 load from /dev/null of del terminate into cis_ods.cs_pgm nonrecoverable

  Agent Type     Node     SQL Code     Result
______________________________________________________________________________
  LOAD           001      +00000000    RESTART required.
______________________________________________________________________________
  LOAD           002      -00002032    Error. RESTART required.
______________________________________________________________________________
  LOAD           003      -00002032    Error. RESTART required.
______________________________________________________________________________
  LOAD           004      -00002032    Error. RESTART required.
______________________________________________________________________________
  RESULTS:       1 of 4 LOADs completed successfully.
______________________________________________________________________________

Summary of LOAD Agents:
Number of rows read         = 0
Number of rows skipped      = 0
Number of rows loaded       = 0
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 0

SQL2032N  The "nonrecoverable" parameter is not valid.  SQLSTATE=22531
After searching around a bit on google and IBM sites, I found some suggestions that perhaps I needed to use copy yes instead of nonrecoverable. I’m not sure if BLOCKNONLOGGED might be set on these servers. I did this:
$ db2 load from /dev/null of del terminate into cis_ods.cs_pgm copy yes to /dev/null

  Agent Type     Node     SQL Code     Result
______________________________________________________________________________
  LOAD           001      +00000000    Success.
______________________________________________________________________________
  LOAD           002      +00000000    Success.
______________________________________________________________________________
  LOAD           003      +00000000    Success.
______________________________________________________________________________
  LOAD           004      +00000000    Success.
______________________________________________________________________________
  RESULTS:       4 of 4 LOADs completed successfully.
______________________________________________________________________________

Summary of LOAD Agents:
Number of rows read         = 0
Number of rows skipped      = 0
Number of rows loaded       = 0
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 0

Resolution

I then tried the simple select again, and this time it was successful! The state of the table was cleared and it was now empty. This is what the client wanted – an empty but accessible table.

I would not have used copy yes to /dev/null if HADR was in use for the database, because that would make the table unavailable on the HADR server. I would have had to do copy yes to a shared location between the two servers.

You may also like...

8 Responses

  1. Chris Aldrich says:

    I would also suggest setting LOCKTIMEOUT to 120 and DLCHKTIME to 60000 for warehouse/OLAP based loads. From what I have researched on the web (including DBI’s blogs) these are the recommended settings for warehouses to prevent lock timeouts and dead locks.

  2. Isaac Munoz says:

    Hi Ember,

    Thanks for sharing your problem. I would also suggest if the customer is happy with an empty accessible table then you could also just drop and recreate it.

    Regards

  3. Masheed says:

    Hi Ember,

    on db2 V10.5 load job was cancelled by the developer and let me know that table is not responding.
    When I check the load utility status with query (load query table table-name), it shows “table is in progress”. When I checked for lock it has locked as well.

    I had try multiple times to kill the session db2 force application all. But it is not kill the session till 4 hours.

    I had execute db2 load terminate command as well, it has still running from couple of hours. And its blocked by the load id user.

    Can you please help how I can terminate the load utility when table is in progress stat and table level lock on it

    • Ember Crooks says:

      Is this a load replace where the entire content of the table is intended to be replaced?

      • Masheed says:

        Its a partition table and load job first truncate the table and then insert the data. Every time load job failed or cancel, table goes to hang state and show table level locked and we are unable to query. And to put table back to normal state I have to restart the DB.

  4. Babasaheb Sawant says:

    Many thanks Ember for sharing your exp, even I faced same issue and resolved.

  5. Andra Radulescy says:

    Hi Ember,

    An article related to difference between warmns start replication, full refresh and cold start will be very useful!
    I had a problem on my project this week, it seems that I started replication warmns mode + enable full refresh and tables got into set integrity pending state. I solved the problem, but I forgot to disable full refresh and after a couple of days, we had a SEV1 again because tables were not accessible.

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *