Analyzing Deadocks – the old way
In 9.7, DB2 started offering a new monitoring method for deadlocking. Though this post describes the “old” way, this method also works in db2 9.7. Detailedeadlock event monitors have been deprecated, but not yet removed. This means that even in 9.7, you can still create them and work with them.
If you’re at all confused about the difference between deadlocks and lock timeouts, please first read my post on Deadlocks VS. Lock timeouts.
Creating the deadlock event monitor
One of the most critical things here is that you must have the detailedeadlock event monitor in place and working before you run into an issue. By default (even in 9.7), db2 has one called simply ‘db2detaildeadlock’. The only problem with it is that it may run out of space rather quickly. As a result, I re-create it on build, using this syntax (you’ll need a database connection, of course):
db2 "create event monitor my_detaildeadlock for deadlocks with details write to file 'my_detaildeadlock' maxfiles 2000 maxfilesize 10000 blocked append autostart" DB20000I The SQL command completed successfully.
In additon, you must actually manually create the directory for the event monitor. It goes in the ‘db2event’ subdirectory of the database path, so in my latest example, I used something like this statement to create it:
And then there’s activating the new one and dropping the old one:
db2 "set event monitor ros_detaildeadlock state=1" DB20000I The SQL command completed successfully. db2 "set event monitor db2detaildeadlock state=0" DB20000I The SQL command completed successfully. db2 "drop event monitor db2detaildeadlock" DB20000I The SQL command completed successfully.
Finally, you’ll want to verify the event monitor state:
> db2 "select substr(evmonname,1,30) as evmonname, EVENT_MON_STATE(evmonname) as state from syscat.eventmonitors with ur" EVMONNAME STATE ------------------------------ ----------- ROS_DETAILDEADLOCK 1 1 record(s) selected.
1 means active for the state, and 0 means not active, so this is the output we want.
Parsing and Analyzing output
Now that you’ve got the event monitor running, what do you do with it? Well, assuming you actually have some deadlocks (as you can tell through the snapshot event monitor, using db2top, db2pd, db2 admin views, or the get snapshot command), you’ll want to flush the event monitor and convert it’s output to human readable form:
> db2 flush event monitor MY_DETAILDEADLOCK DB20000I The SQL command completed successfully.
> db2evmon -path /db_data/db2inst1/NODE0000/SQL00002/db2event/my_detaildeadlock >deadlocks.out Reading /db_data/db2inst1/NODE0000/SQL00002/db2event/ros_detaildeadlock/00000000.evt ...
Your path may be different, of course. I prefer the path option on db2evmon because I’ve had less problems with it. There is an option to specify the dbname and event monitor name – I just find that it’s not as reliable.
So now you’ve done the easy part. Yep, that’s right, that’s the easy part. Depending on the number of deadlocks, you may now have a giant file. I seem to remember parsing a 15 GB one at one time. Here are some snippets of the output to give an idea of what you’re looking at:
379) Deadlock Event ... Deadlock ID: 20 Number of applications deadlocked: 2 Deadlock detection time: 01/03/2012 14:06:13.425034 Rolled back Appl participant no: 2 Rolled back Appl Id: 172.19.10.61.37259.120103200006 Rolled back Appl seq number: : 0009
381) Deadlocked Connection ... Deadlock ID: 20 Participant no.: 2 Participant no. holding the lock: 1 Appl Id: 172.19.10.61.37259.120103200006 Appl Seq number: 00009 Tpmon Client Workstation: spp27comm02x Appl Id of connection holding the lock: 172.19.10.61.62895.120103194755 Seq. no. of connection holding the lock: 00001 Lock wait start time: 01/03/2012 14:06:03.651592 Lock Name : 0x02000C1A1500BCE31800000052 Lock Attributes : 0x00000000 Release Flags : 0x00000001 Lock Count : 1 Hold Count : 0 Current Mode : none Deadlock detection time: 01/03/2012 14:06:13.425119 Table of lock waited on : USERS Schema of lock waited on : WSCOMUSR Data partition id for table : 0 Tablespace of lock waited on : USERSPACE1 Type of lock: Row Mode of lock: X - Exclusive Mode application requested on lock: NS - Share (and Next Key Share) Node lock occured on: 0 Lock object name: 106899963925 Application Handle: 47264 Deadlocked Statement: Type : Dynamic Operation: Fetch Section : 2 Creator : NULLID Package : SYSSH200 Cursor : SQL_CURSH200C2 Cursor was blocking: FALSE Text : SELECT T1.STATE, T1.MEMBER_ID, T1.OPTCOUNTER, T1.TYPE, T2.FIELD2, T2.REGISTRATIONUPDATE, T2.FIELD3, T2.LASTORDER, T2.LANGUAGE_ID, T2.PREVLASTSESSION, T2.SETCCURR, T2.DN, T2.REGISTRATIONCANCEL, T2.LASTSESSION, T2.REGISTRATION, T2.FIELD1, T2.REGISTERTYPE, T2.PROFILETYPE, T2.PERSONALIZATIONID FROM MEMBER T1, USERS T2 WHERE T1.TYPE = 'U' AND T1.MEMBER_ID = T2.USERS_ID AND T1.MEMBER_ID = ? List of Locks:
383) Deadlocked Connection ... Deadlock ID: 20 Participant no.: 1 Participant no. holding the lock: 2 Appl Id: 172.19.10.61.62895.120103194755 Appl Seq number: 00905 Tpmon Client Workstation: spp27comm02x Appl Id of connection holding the lock: 172.19.10.61.37259.120103200006 Seq. no. of connection holding the lock: 00001 Lock wait start time: 01/03/2012 14:06:03.657097 Lock Name : 0x02000D0E2F00F8D61800000052 Lock Attributes : 0x00000000 Release Flags : 0x40000000 Lock Count : 1 Hold Count : 0 Current Mode : U - Update Deadlock detection time: 01/03/2012 14:06:13.425274 Table of lock waited on : MEMBER Schema of lock waited on : WSCOMUSR Data partition id for table : 0 Tablespace of lock waited on : USERSPACE1 Type of lock: Row Mode of lock: NS - Share (and Next Key Share) Mode application requested on lock: X - Exclusive Node lock occured on: 0 Lock object name: 106685792303 Application Handle: 47206 Deadlocked Statement: Type : Dynamic Operation: Execute Section : 25 Creator : NULLID Package : SYSSH200 Cursor : SQL_CURSH200C25 Cursor was blocking: FALSE Text : UPDATE MEMBER SET STATE = ?, OPTCOUNTER = ? WHERE MEMBER_ID = ? AND OPTCOUNTER = ? List of Locks:
I’ve removed the list of locks due to length, and also entries on the connection events, but have not altered the actual output here.
The “Deadlock ID” here lets us identify which deadlock this was a participant in. Deadlocks most frequently involve 2 connections, but they can involve 3, 4, 5, or even more.
Looking at “Participant no” both in the “Deadlock Event” section and the “Deadlocked Connection” sections and “Rolled back Appl participant no” in the “Deadlock Event” section, you can understand which statement was rolled back and which was allowed to continue.
There’s a lot more useful information there to parse through – most of it is pretty obvious in its meaning.
It is nice to go through and determine if the same statements were involved in deadlocks over and over again – which statements were most frequently involved in a deadlock. It’s also nice to analyze the timing of the deadlocks – I find summarizing by hour very useful in helping to determine if they were limited to a specific time period. It can also be interesting to summarize by table to see if a particular table is frequently involved.
What to do with the analysis
The number one thing to do with what you find is to provide the SQL to your developers. They should be able to understand where that SQL is coming from in your application, and should be able to come up with ideas to reduce the deadlocking.
Remember that deadlocking is an application problem whose symptoms appear on the database. The sum total of everything you can do that might reduce deadlocking at the database level is:
- Keep runstats current
- Set the db2 registry variables, ONLY IF YOUR APPLICATION EXPLICITLY SUPPORTS THEM:
Increasing LOCKLIST will not help with deadlocking unless you’re also seeing lock escallations.
Info Center entry on deprecation of detaildeadlock event monitor: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.wn.doc/doc/i0054715.html
Info Center entry on db2evmon:
Keep your eyes open for a new post on Analyzing Deadlocks the new way – using the event monitor for locks.