Fun With 9.7 Locktimeout Analysis

I first addressed the new 9.7 methodology for deadlock and locktimeout analysis in my post Analyzing Deadlocks – The New Way. I’ve been playing with it a bit, and wanted to share what I’ve found. First of all, so far, I like it. I can’t speak to overhead yet, but it integrates lock timeout analysis more tightly than it previously was. I went out to take a look at one database I support that generally runs pretty smoothly, thanks in part to a ridiculously oversized database server.

I used the SQL from my own original post on the topic to see what was going on. First:

-bash-3.2$ db2 "select substr(event_type,1,18) as event_type, count(*) as count, sum(dl_conns) sum_involved_connections from DBA.LOCK_EVENT group by event_type with ur"

EVENT_TYPE         COUNT       SUM_INVOLVED_CONNECTIONS
------------------ ----------- ------------------------
LOCKTIMEOUT                600                        -

  1 record(s) selected.

The good news was that there are no deadlocks over a couple of weeks. The bad news? 600 lock timeouts. Eek, that seems a bit high. So I dug further to see what times we were seeing issues. Was it peak volume or during nighttime maintenance?

-bash-3.2$ db2 "select substr(event_type,1,18) as event_type, year(event_timestamp) as year, month(event_timestamp) as month, day(event_timestamp) as day, hour(event_timestamp) as hour, count(*) as count from DBA.LOCK_EVENT group by year(event_timestamp), month(event_timestamp), day(event_timestamp), hour(event_timestamp), event_type order by year(event_timestamp), month(event_timestamp), day(event_timestamp), hour(event_timestamp), event_type with ur"

EVENT_TYPE         YEAR        MONTH       DAY         HOUR        COUNT
------------------ ----------- ----------- ----------- ----------- -----------
LOCKTIMEOUT               2012           1          21           0         425
LOCKTIMEOUT               2012           1          21           3           1
LOCKTIMEOUT               2012           1          22           3           1
LOCKTIMEOUT               2012           1          23           3           1
LOCKTIMEOUT               2012           1          24           3           1
LOCKTIMEOUT               2012           1          25           3           1
LOCKTIMEOUT               2012           1          26           3           1
LOCKTIMEOUT               2012           1          27           3           1
LOCKTIMEOUT               2012           1          28           0         138
LOCKTIMEOUT               2012           1          28           3           1
LOCKTIMEOUT               2012           1          29           3           1
LOCKTIMEOUT               2012           1          30           3           1
LOCKTIMEOUT               2012           1          31           3           1
LOCKTIMEOUT               2012           2           1           3           1
LOCKTIMEOUT               2012           2           2           3           1
LOCKTIMEOUT               2012           2           3           3           1
LOCKTIMEOUT               2012           2           4           0           7
LOCKTIMEOUT               2012           2           4           3           1
LOCKTIMEOUT               2012           2           5           3           1
LOCKTIMEOUT               2012           2           6           3           1
LOCKTIMEOUT               2012           2           7           3           1
LOCKTIMEOUT               2012           2           8           3           1
LOCKTIMEOUT               2012           2           9           3           1
LOCKTIMEOUT               2012           2          10           3           1
LOCKTIMEOUT               2012           2          11           0           9

  25 record(s) selected.

That information immediately made me feel better. The  Spikes are Saturday mornings in the midnight hour. That’s when I do my runstats and reorgs, along with an rbind. To confirm my suspicion of the connection, I took a look at the tables involved.

-bash-3.2$ db2 "select substr(lp.table_schema,1,18) as table_schema, substr(lp.table_name,1,30) as table_name, substr(le.event_type,1,18) as lock_event, count(*)/2 as count from DBA.LOCK_PARTICIPANTS lp, DBA.LOCK_EVENT le where lp.xmlid=le.xmlid group by lp.table_schema, lp.table_name, le.event_type order by lp.table_schema, lp.table_name, le.event_type with ur"
TABLE_SCHEMA       TABLE_NAME                     LOCK_EVENT         COUNT
------------------ ------------------------------ ------------------ -----------
                                                  LOCKTIMEOUT                279
DBA                ROS_LOCKS                      LOCKTIMEOUT                 10
SYSIBM             SYSTABLES                      LOCKTIMEOUT                  2
WSCOMUSR           KEYS                           LOCKTIMEOUT                  1
WSCOMUSR           ORDERS                         LOCKTIMEOUT                  2
WSCOMUSR           USERS                          LOCKTIMEOUT                  4
-                  -                              LOCKTIMEOUT                300

  7 record(s) selected.

The row with dashes for table_name and table_schema, I was expecting. Half of the locktimeout is always reported this way. However, we should be getting values for the other side of those. So I decide to take a look at the statements involved, and sure enough I find this:

-bash-3.2$ db2 "with t1 as (select STMT_PKGCACHE_ID as STMT_PKGCACHE_ID, count(*) as stmt_count from dba.lock_participant_activities group by STMT_PKGCACHE_ID) select t1.stmt_count, (select substr(STMT_TEXT,1,100) as stmt_text from dba.lock_participant_activities a1 where a1.STMT_PKGCACHE_ID=t1.STMT_PKGCACHE_ID fetch first 1 row only) from t1 order by t1.stmt_count desc with ur"

STMT_COUNT  STMT_TEXT
----------- ----------------------------------------------------------------------------------------------------
        425 SELECT NAME,LENGTH(NAME), CREATOR, LENGTH(CREATOR), PKGVERSION, LENGTH(PKGVERSION),VALID,UNIQUE_ID F
        138 SELECT NAME,LENGTH(NAME), CREATOR, LENGTH(CREATOR), PKGVERSION, LENGTH(PKGVERSION),VALID,UNIQUE_ID F
          2 SELECT T1.KEYS_ID, T1.TABLENAME, T1.LOWERBOUND, T1.UPPERBOUND, T1.COUNTER, T1.PREFETCHSIZE, T1.COLUM
          1 UPDATE KEYS  SET TABLENAME = ?, LOWERBOUND = ?, UPPERBOUND = ?, COUNTER = ?, PREFETCHSIZE = ?, COLUM

  4 record(s) selected.

Clearly there are just two statements that account for just about everything. Expanding that, I see:

-bash-3.2$ db2 "with t1 as (select STMT_PKGCACHE_ID as STMT_PKGCACHE_ID, count(*) as stmt_count from dba.lock_participant_activities group by STMT_PKGCACHE_ID) select t1.stmt_count, (select substr(STMT_TEXT,1,256) as stmt_text from dba.lock_participant_activities a1 where a1.STMT_PKGCACHE_ID=t1.STMT_PKGCACHE_ID fetch first 1 row only) from t1 order by t1.stmt_count desc with ur"

STMT_COUNT  STMT_TEXT
----------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        425 SELECT NAME,LENGTH(NAME), CREATOR, LENGTH(CREATOR), PKGVERSION, LENGTH(PKGVERSION),VALID,UNIQUE_ID FROM SYSIBM.SYSPLAN WHERE NOT ((CREATOR = 'NULLID  ') AND (NAME = 'SQLE1E03') AND (PKGVERSION = '')) ORDER BY CREATOR ASC, NAME ASC, PKGVERSION ASC
        138 SELECT NAME,LENGTH(NAME), CREATOR, LENGTH(CREATOR), PKGVERSION, LENGTH(PKGVERSION),VALID,UNIQUE_ID FROM SYSIBM.SYSPLAN WHERE NOT ((CREATOR = 'NULLID  ') AND (NAME = 'SQLE1E03') AND (PKGVERSION = '')) ORDER BY CREATOR ASC, NAME ASC, PKGVERSION ASC
          2 SELECT T1.KEYS_ID, T1.TABLENAME, T1.LOWERBOUND, T1.UPPERBOUND, T1.COUNTER, T1.PREFETCHSIZE, T1.COLUMNNAME, T1.OPTCOUNTER FROM KEYS  T1 WHERE T1.tablename = ? FOR UPDATE WITH RS
          1 UPDATE KEYS  SET TABLENAME = ?, LOWERBOUND = ?, UPPERBOUND = ?, COUNTER = ?, PREFETCHSIZE = ?, COLUMNNAME = ?, OPTCOUNTER = ? WHERE KEYS_ID = ? AND OPTCOUNTER = ?

  4 record(s) selected.

Looking at the SQL, both of those statements are related to SYSIBM.SYSPLAN. My conclusion is that it’s probably the db2rbind that is done during the process, and that the locks are on internal plans, and that’s why the table wasn’t showing up in our list of tables.

Gathering this same information would not have been as easy with the old method of monitoring for locking issues. I would have had to parse through the files created for the purpose in the DIAGPATH and found the commonality that way.

 

You may also like...

2 Responses

  1. Saurabh says:

    This is really nice!.. I like the way you take a top-down approach in diagnosing problems.. Very very nice!..

    – Saurabh

  1. March 19, 2012

    […] Fun With 9.7 Locktimout Analysis Covers a real-world investigation into lock timeouts using the new method, and goes into greater detail than I had time for in the DB2′s Got talent presentation. It includes specific SQL  and my thought process while analyzing a situation. […]

Leave a Reply

Your email address will not be published. Required fields are marked *