Why won’t you just die?! (Cleaning DB2 Process in Memory)
Zombied (think the “Walking Dead” TV show).
Sometimes there is a memory leak, instance level crash, trap, or OS issue. DB2 just goes into a bad state and you have to bring down DB2 as gently as you can.
And if not, hit it with a bat. Whatever works.
Even when DB2 comes down properly, you need to be diligent to make sure EVERYTHING has stopped before some more serious work happens, like a DB2 upgrade. You need ZERO DB2 processes in memory.
[Side Note: Ember has a really good article on stopping DB2 in preparation for an upgrade. Check out the article here – Link]
There is a level of escalation when DB2 is snarled. You start with deactivating the database which allows all things to come to a natural end. However, if step two fails, move to step 3, if step 3 fails, move to the big guns in step four. If you end up hitting step five, you know things are not good.
- Deactivate Database
- DB2STOP force
- Server Reboot
There is always real risk with a DB2_KILL of instance or database corruption. It is always a last resort.
Let’s say DB2 is down one way or another. And we want to be sure all DB2 processes are out of memory and the server isn’t chewing on something that could be a zombie. How do we accomplish that in an AIX/UNIX/LINUX environment?
From the DB2 Info Center: “The ipcs command writes to the standard output information about active interprocess communication facilities. If you do not specify any flags, the ipcs command writes information in a short form about currently active message queues, shared memory segments, semaphores, remote queues, and local queue headers”.
To use this in our environment you can issue the following.
Command: ipcs | grep db2 This returns something similar to: q 154140691 0xffffffff -Rrw------- INSTID GROUPNM q 611319866 0xffffffff -Rrw------- INSTID GROUPNM m 708837404 0x7e074e61 --rw------- INSTID GROUPNM m 577765405 0xffffffff --rw------- INSTID GROUPNM s 16777236 0x7e074e74 --ra-ra-ra- FENCEID GROUPNM
(Where ‘q’ is Message Queue, ‘m’ is Shared Memory Segment, and ‘s’ is a semaphore)
In short, all items returned are DB2 processes still in memory. If you brought DB2 down cleanly there may be very few or no processes. If you hit DB2 with a bat (db2_kill), you may have many.
To clean your memory of DB2 processes you can issue two different commands.
Issuing “ipclean” by itself will act as a catch-all. Essentially any processes in memory for that ID are terminated, or a terminate is attempted. If you came down cleanly with a deactivate and DB2STOP, IPCLEAN should be sufficient. DB2_KILL issues an ipclean as a part of its processing.
From the DB2 Info Center: “Removes message queue, semaphore set, or shared memory identifiers”.
Command: Ipcrm -q ID Ipcrm -m ID Ipcrm -s ID
Where q,m,s matches what was returned in IPCS command. The ID used is listed in the first column of IPCS.
ipcrm -q 154140691
For more details on the IPCS and IPCRM commands, head over to the DB2 Info Center at http://tinyurl.com/dyra4m2 .
Let’s admit it; we are not System Administrators (SA). Use these commands with caution as they can cause more harm than good. Run them as the DB2 instance owner. Use common sense, and if you are over your head, reach out to your SA. They will be happy to help. Getting them involved could prevent a reboot.
I don’t know about you, but I bribe my SA for help with coffee. Don’t make a newbie mistake and try to bribe with that weak stuff in the break room. If the SA grumbles and snarls, you used cheap coffee. Proceed to Starbucks quickly.
Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick
Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Attack of the Blob: Blobs in a Transaction Processing Environment