Analyzing Deadocks – the old way

In 9.7, DB2 started offering a new monitoring method for deadlocking. Though this post describes the “old” way, this method also works in db2 9.7. Detailedeadlock event monitors have been deprecated, but not yet removed. This means that even in 9.7, you can still create them and work with them.

If you’re at all confused about the difference between deadlocks and lock timeouts, please first read my post on Deadlocks VS. Lock timeouts.

Creating the deadlock event monitor

One of the most critical things here is that you must have the detailedeadlock event monitor in place and working before you run into an issue. By default (even in 9.7), db2 has one called simply ‘db2detaildeadlock’. The only problem with it is that it may run out of space rather quickly. As a result, I re-create it on build, using this syntax (you’ll need a database connection, of course):

db2 "create event monitor my_detaildeadlock for deadlocks with details write to file 'my_detaildeadlock' maxfiles 2000 maxfilesize 10000 blocked append autostart"
DB20000I  The SQL command completed successfully.

In additon, you must actually manually create the directory for the event monitor. It goes in the ‘db2event’ subdirectory of the database path, so in my latest example, I used something like this statement to create it:

mkdir /db_data/db2inst1/NODE0000/SQL00002/db2event/my_detaildeadlock

And then there’s activating the new one and dropping the old one:

db2 "set event monitor ros_detaildeadlock state=1"
DB20000I  The SQL command completed successfully.

db2 "set event monitor db2detaildeadlock state=0"
DB20000I  The SQL command completed successfully.

db2 "drop event monitor db2detaildeadlock"
DB20000I  The SQL command completed successfully.

Finally, you’ll want to verify the event monitor state:

> db2 "select substr(evmonname,1,30) as evmonname, EVENT_MON_STATE(evmonname) as state from syscat.eventmonitors with ur"

EVMONNAME                      STATE
------------------------------ -----------
ROS_DETAILDEADLOCK                       1

  1 record(s) selected.

1 means active for the state, and 0 means not active, so this is the output we want.

Parsing and Analyzing output

Now that you’ve got the event monitor running, what do you do with it? Well, assuming you actually have some deadlocks (as you can tell through the snapshot event monitor, using db2top, db2pd, db2 admin views, or the get snapshot command), you’ll want to flush the event monitor and convert it’s output to human readable form:

> db2 flush event monitor MY_DETAILDEADLOCK
DB20000I  The SQL command completed successfully.
> db2evmon -path /db_data/db2inst1/NODE0000/SQL00002/db2event/my_detaildeadlock >deadlocks.out

Reading /db_data/db2inst1/NODE0000/SQL00002/db2event/ros_detaildeadlock/00000000.evt ...

Your path may be different, of course. I prefer the path option on db2evmon because I’ve had less problems with it. There is an option to specify the dbname and event monitor name – I just find that it’s not as reliable.

So now you’ve done the easy part. Yep, that’s right, that’s the easy part. Depending on the number of deadlocks, you may now have a giant file. I seem to remember parsing a 15 GB one at one time. Here are some snippets of the output to give an idea of what you’re looking at:

379) Deadlock Event ...
  Deadlock ID:   20
  Number of applications deadlocked: 2
  Deadlock detection time: 01/03/2012 14:06:13.425034
  Rolled back Appl participant no: 2
  Rolled back Appl Id: 172.19.10.61.37259.120103200006
  Rolled back Appl seq number: : 0009
...
381) Deadlocked Connection ...
  Deadlock ID:   20
  Participant no.: 2
  Participant no. holding the lock: 1
  Appl Id: 172.19.10.61.37259.120103200006
  Appl Seq number: 00009
  Tpmon Client Workstation: spp27comm02x
  Appl Id of connection holding the lock: 172.19.10.61.62895.120103194755
  Seq. no. of connection holding the lock: 00001
  Lock wait start time: 01/03/2012 14:06:03.651592
  Lock Name       : 0x02000C1A1500BCE31800000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x00000001
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : none
  Deadlock detection time: 01/03/2012 14:06:13.425119
  Table of lock waited on      : USERS
  Schema of lock waited on     : WSCOMUSR
  Data partition id for table  : 0
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: X   - Exclusive
  Mode application requested on lock: NS  - Share (and Next Key Share)
  Node lock occured on: 0
  Lock object name: 106899963925
  Application Handle: 47264
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Fetch
    Section  : 2
    Creator  : NULLID
    Package  : SYSSH200
    Cursor   : SQL_CURSH200C2
    Cursor was blocking: FALSE
    Text     : SELECT T1.STATE, T1.MEMBER_ID, T1.OPTCOUNTER, T1.TYPE, T2.FIELD2, T2.REGISTRATIONUPDATE, T2.FIELD3, T2.LASTORDER, T2.LANGUAGE_ID, T2.PREVLASTSESSION, T2.SETCCURR, T2.DN, T2.REGISTRATIONCANCEL, T2.LASTSESSION, T2.REGISTRATION, T2.FIELD1, T2.REGISTERTYPE, T2.PROFILETYPE, T2.PERSONALIZATIONID FROM MEMBER  T1, USERS  T2 WHERE T1.TYPE = 'U' AND T1.MEMBER_ID = T2.USERS_ID AND T1.MEMBER_ID = ?
  List of Locks:
...
383) Deadlocked Connection ...
  Deadlock ID:   20
  Participant no.: 1
  Participant no. holding the lock: 2
  Appl Id: 172.19.10.61.62895.120103194755
  Appl Seq number: 00905
  Tpmon Client Workstation: spp27comm02x
  Appl Id of connection holding the lock: 172.19.10.61.37259.120103200006
  Seq. no. of connection holding the lock: 00001
  Lock wait start time: 01/03/2012 14:06:03.657097
  Lock Name       : 0x02000D0E2F00F8D61800000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x40000000
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : U   - Update
  Deadlock detection time: 01/03/2012 14:06:13.425274
  Table of lock waited on      : MEMBER
  Schema of lock waited on     : WSCOMUSR
  Data partition id for table  : 0
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: NS  - Share (and Next Key Share)
  Mode application requested on lock: X   - Exclusive
  Node lock occured on: 0
  Lock object name: 106685792303
  Application Handle: 47206
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Execute
    Section  : 25
    Creator  : NULLID
    Package  : SYSSH200
    Cursor   : SQL_CURSH200C25
    Cursor was blocking: FALSE
    Text     : UPDATE MEMBER  SET STATE = ?, OPTCOUNTER = ? WHERE MEMBER_ID = ? AND OPTCOUNTER = ?
  List of Locks:
...

I’ve removed the list of locks due to length, and also entries on the connection events, but have not altered the actual output here.

The “Deadlock ID” here lets us identify which deadlock this was a participant in. Deadlocks most frequently involve 2 connections, but they can involve 3, 4, 5, or even more.

Looking at “Participant no” both in the “Deadlock Event” section and the “Deadlocked Connection” sections and “Rolled back Appl participant no” in the “Deadlock Event” section, you can understand which statement was rolled back and which was allowed to continue.

There’s a lot more useful information there to parse through – most of it is pretty obvious in its meaning.

It is nice to go through and determine if the same statements were involved in deadlocks over and over again – which statements were most frequently involved in a deadlock. It’s also nice to analyze the timing of the deadlocks – I find summarizing by hour very useful in helping to determine if they were limited to a specific time period. It can also be interesting to summarize by table to see if a particular table is frequently involved.

What to do with the analysis

The number one thing to do with what you find is to provide the SQL to your developers. They should be able to understand where that SQL is coming from in your application, and should be able to come up with ideas to reduce the deadlocking.

Remember that deadlocking is an application problem whose symptoms appear on the database. The sum total of everything you can do that might reduce deadlocking at the database level is:

  1. Keep runstats current
  2. Set the db2 registry variables, ONLY IF YOUR APPLICATION EXPLICITLY SUPPORTS THEM:
    • DB2_SKIPINSERTED
    • DB2_SKIPDELETED
    • DB2_EVALUNCOMMITTED

Increasing LOCKLIST will not help with deadlocking unless you’re also seeing lock escallations.

References:

Info Center entry on deprecation of detaildeadlock event monitor: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.wn.doc/doc/i0054715.html

 

Info Center entry on db2evmon:

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002046.html

 

Keep your eyes open for a new post on Analyzing Deadlocks the new way – using the event monitor for locks.

You may also like...

8 Responses

  1. Ray Houle says:

    Ember,

    Wounderful post ! I greatly appreciated the end to end “how to” approach, good cross-references.

    Thanks

  2. imran says:

    Its a good article explained here for DBA’s to get through deadlock issues.

  3. Lee Goddard says:

    What a great blog on locking. I learned a lot about the new get_mon functions. Ember is my new hero for the week! U ROCK Ember!

    Lee

  4. Andres Gomez says:

    Line ‘db2 “set event monitor ros_detaildeadlock state=1″‘ is incorrect, the event monitor name, it should be ‘db2 “set event monitor my_detaildeadlock state=1″‘

  1. February 13, 2012

    […] and if you want to avoid event monitors that write to tables, that’s still the only way. See Analyzing Deadlocks – the old way for more details on that […]

  2. March 2, 2012

    […] Analyzing Deadocks – the old way Gives a detailed look at deadlock analysis before 9.7 – a method that still works in 9.7. This was mentioned in the DB2′s Got Talent presentation, but was not covered in detail […]

  3. March 9, 2012

    […] Analyzing Deadocks – the old way […]

Leave a Reply

Your email address will not be published. Required fields are marked *