SQL Access to the DB2 Diagnostic Log

In recent releases, DB2 has offered more and more opportunities for using SQL to get to data we need to maintain our databases. I’m a big fan of some of this new functionality, particularly the mon_ stuff (see My New Best Friend – mon_ Part 1: Table Functions and My New Best Friend – mon_ Part 2: Views). Many of the things I’ve been enjoying are 9.7 or later, with even more in some FixPacks.

But even when I go to the IDUG conference and I read blogs and really research more than many DBAs out there, I sometimes miss nifty features. This is where simple chatting with fellow DBAs from other companies can teach me so much. I happened to just be chatting with some friends on a recent Thursday, and they told me about PDLOGMSGS_LAST24HOURS. Mind=blown, and of course I go on to research more to share with my readers.

The table functions and administrative view covered in this post allow SQL access to your diagnostic logs.

PDLOGMSGS_LAST24HOURS Administrative View

Especially to newer DBAs, Administrative Views may be easier than table functions. I remember when the first official attempts at accessing data via table fuctions were introduced (with version 8 or 8.2), and a dba of only a couple of years, I thought they were just too difficult to use and to remember which inputs they needed. I’ve changed my tune, but it has largely been as my understanding of SQL has expanded (and my frustration with trying to script parsing all that text in snapshots/reorgchks/etc. has grown). But the fact remains that a view is easy to use.

Here’s an example:

SELECT TIMESTAMP, substr(APPL_ID,1,15) as APPL_ID_TRUNC, MSGSEVERITY as SEV, MSGNUM, substr(MSG,1,50) as MSG_trunc FROM SYSIBMADM.PDLOGMSGS_LAST24HOURS ORDER BY TIMESTAMP DESC

TIMESTAMP                  APPL_ID_TRUNC   SEV MSGNUM      MSG_TRUNC
-------------------------- --------------- --- ----------- --------------------------------------------------
2013-06-25-19.59.03.567782 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-18.59.03.066033 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-17.59.03.505498 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-16.59.02.649889 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-15.59.03.194671 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-14.59.03.022017 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-13.59.02.936827 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-12.59.03.392853 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-11.59.02.795087 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-10.59.02.777687 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-09.59.03.216297 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-08.59.03.219218 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-07.59.03.545914 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-06.59.02.759707 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-05.59.03.584408 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-04.59.03.304691 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-03.59.03.296858 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-02.59.03.030282 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-01.59.02.988167 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-25-00.59.03.147449 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-24-23.59.03.084096 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-24-22.59.03.551389 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-24-21.59.03.551260 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is
2013-06-24-20.59.03.773344 *LOCAL.db2inst1 E          6044 ADM6044E  The DMS table space "DBA32K" (ID "9") is

  24 record(s) selected.

Most of the time, I am interested in more than just 24 hours. To do that, take a look at the PD_GET_LOG_MSGS section, next. The field options for the view and this table function are identical, so queries from one will apply to the other with very minor tweaks to the table name/table function call.

PD_GET_LOG_MSGS Table Function

The real reason to use the table function here is to get the data for more than 24 hours. One frustrating part of looking at the db2diag.log can be parsing through screens and screens of warnings to find the more important messages. I like to truncate the actual message to be able to scroll through more quickly and see if there is anything concerning.

Here’s a query that is useful for that, with some sample output:

SELECT TIMESTAMP, substr(APPL_ID,1,15) as APPL_ID_TRUNC, MSGSEVERITY as SEV, MSGNUM, substr(MSG,1,50) as MSG_trunc FROM TABLE ( PD_GET_LOG_MSGS( CURRENT_TIMESTAMP - 7 DAYS)) AS T ORDER BY TIMESTAMP DESC

TIMESTAMP                  APPL_ID_TRUNC   SEV MSGNUM      MSG_TRUNC
-------------------------- --------------- --- ----------- --------------------------------------------------
2013-06-22-00.12.38.145088 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.37.958731 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "6" (OBJECTID "12
2013-06-22-00.12.37.952392 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "5" (OBJECTID "12
2013-06-22-00.12.37.945722 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "4" (OBJECTID "12
2013-06-22-00.12.37.937003 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "3" (OBJECTID "12
2013-06-22-00.12.37.920251 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "2" (OBJECTID "12
2013-06-22-00.12.37.911164 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "12
2013-06-22-00.12.37.885301 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab
2013-06-22-00.12.35.336689 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.35.213472 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "2" (OBJECTID "12
2013-06-22-00.12.35.212621 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "12
2013-06-22-00.12.35.166089 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab
2013-06-22-00.12.21.939731 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.21.837597 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "4" (OBJECTID "12
2013-06-22-00.12.21.837013 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "3" (OBJECTID "12
2013-06-22-00.12.21.836187 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "2" (OBJECTID "12
2013-06-22-00.12.21.835290 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "12
2013-06-22-00.12.21.802720 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab
2013-06-22-00.12.17.915036 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.17.710513 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "2" (OBJECTID "12
2013-06-22-00.12.17.705515 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "12
2013-06-22-00.12.17.679026 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab
2013-06-22-00.12.15.784374 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.15.618504 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "3" (OBJECTID "12
2013-06-22-00.12.15.558027 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "2" (OBJECTID "12
2013-06-22-00.12.15.504726 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "12
2013-06-22-00.12.15.478467 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab
2013-06-22-00.07.39.843666 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
...
2013-06-22-00.02.18.847196 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "1" (OBJECTID "6"
2013-06-22-00.02.18.701810 *LOCAL.db2inst1 W          9501 ADM9501W  Index reorganization has started for tab

  200 record(s) selected.

Nice boring diag log there, just the way we like to see it.

Some interesting notes on what we’re looking at there – first, I truncated the appl_id a fair amount to hide anything that could be translated into an IP address – can’t give out IP addresses online. Kind of nice – I frequently end up manually masking or changing the IP address in things I look at for the blog. 30 characters is more reasonable if you really want to see the full data there.

Next, notice the “MSGSEVERITY as SEV” column. There’s one that could be useful for weeding out the warnings. That might work well in the where clause.

Also notice the MSGNUM column. That could become really useful if we’re looking for a specific issue. For example, on another database, I noticed a package cache overflow and wanted to see if there had been more than one and how far apart and so far, so I did something like this:

SELECT date(TIMESTAMP) as date, dayofweek(TIMESTAMP) as dayofweek, time(TIMESTAMP) as time, MSGSEVERITY as SEV, MSGNUM, substr(MSG,1,50) as MSG_trunc FROM TABLE ( PD_GET_LOG_MSGS( CURRENT_TIMESTAMP - 30 DAYS)) AS T where MSGNUM=4500 ORDER BY TIMESTAMP DESC

DATE       DAYOFWEEK   TIME     SEV MSGNUM      MSG_TRUNC
---------- ----------- -------- --- ----------- --------------------------------------------------
06/22/2013           7 00:06:11 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:07:26 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:06:34 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:23:00 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:21:47 W          4500 ADM4500W  A package cache overflow condition has o
06/01/2013           7 00:28:23 W          4500 ADM4500W  A package cache overflow condition has o

  6 record(s) selected.

Pretty nifty. There in one query, I can see that I have overflowed my package cache 6 times this month – generally cause for concern. BUT, I can also see they were all on Saturdays within 1/2 hour of midnight – which is when I run runstats/reorgs/rbinds on that server. Interesting, and possibly a package cache overflow I might not want to do anything about.

I would have spent a considerable time with db2diag or the db2diag.log file to figure that out.

It is also possible to use the database name and other interesting factors in your where clause. See the IBM DB2 Info Center for more details on the available fields.

One thing I notice is that the values returned by this table function all have values in MSGNUM – I don’t see things like the start and end of an online backup reported.

PD_GET_DIAG_HIST Table Function

After PD_GET_LOG_MSGS, why even look any further? Well one reason might be data volume. Notice that the previous table function only let us specify an “oldest_timestamp” as input, meaning the table function would parse everything since then. Well, sometimes that would be more data than you want to deal with. Or sometimes you might have a diag log with a lot of data and you want to really focus on just a one week period of time a month ago. Sure you could do the same restrictions with SQL but if you’re dealing with really large amounts of data, you may want to save the load on your system.

There are also additional details available to query here. We can look at things like the edu name, or the dumpfile or other details. Again, really worth a visit to the IBM DB2 Info Center.

First, let me show you how to get similar data as the queries above. A general overview gets us a lot more records, some appearing empty in these fields:

SELECT TIMESTAMP, substr(APPL_ID,1,15) as APPL_ID_TRUNC, LEVEL as SEV, MSGNUM, substr(MSG,1,50) as MSG_trunc FROM TABLE (PD_GET_DIAG_HIST( 'MAIN', '', '', Current timestamp - 7 days, current timestamp) ) AS T ORDER BY TIMESTAMP DESC

TIMESTAMP                  APPL_ID_TRUNC   SEV MSGNUM      MSG_TRUNC
-------------------------- --------------- --- ----------- --------------------------------------------------
2013-06-25-18.20.32.493224 -               I             - -
2013-06-25-18.20.30.760172 -               I             - -
2013-06-25-04.49.15.981251 -               I             - -
2013-06-25-04.49.14.405284 -               I             - -
2013-06-25-03.01.14.764813 *LOCAL.db2inst1 I             - Backup complete.
2013-06-25-03.01.14.758243 -               I             - -
2013-06-25-03.01.14.152681 -               I             - -
2013-06-25-03.00.28.428011 *LOCAL.db2inst1 I             - Starting an online db backup.
2013-06-25-03.00.28.345164 *LOCAL.db2inst1 I             - -
2013-06-25-03.00.28.344987 *LOCAL.db2inst1 I             - -
2013-06-25-03.00.28.226359 *LOCAL.db2inst1 I             - -
2013-06-25-00.48.06.853249 *LOCAL.DB2.1306 -             - -
2013-06-25-00.48.06.771161 *LOCAL.DB2.1306 -             - -
...
2013-06-22-00.12.38.145088 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.38.144806 *LOCAL.db2inst1 W          9502 ADM9502W  Index reorganization is complete for tab
2013-06-22-00.12.37.958731 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "6" (OBJECTID "12
2013-06-22-00.12.37.958539 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "6" (OBJECTID "12
2013-06-22-00.12.37.952392 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "5" (OBJECTID "12
2013-06-22-00.12.37.952196 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "5" (OBJECTID "12
2013-06-22-00.12.37.945722 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "4" (OBJECTID "12
2013-06-22-00.12.37.945528 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "4" (OBJECTID "12
2013-06-22-00.12.37.937003 *LOCAL.db2inst1 W          9503 ADM9503W  Reorganizing index IID "3" (OBJECTID "12
...
2013-06-20-08.30.00.756338 -               I             - -
2013-06-19-16.05.52.731310 -               I             - -
2013-06-19-16.05.51.233289 -               I             - -
2013-06-19-12.06.22.910822 *LOCAL.DB2.1306 -             - -
2013-06-19-12.06.22.846150 *LOCAL.DB2.1306 -             - -

  500 record(s) selected.

That is over twice as many records reported here as in the previous method. It is similar data, but this method pulls from the following extended list of sources:

  • db2diag.log file(s)
  • admin notification log
  • rotating event logs

Therefore another possible reason to use this method is if you’re looking for more than just the db2diag.log. Personally, I believe I prefer the other method, but only time will really tell.

Replicating the output generated with the method above looking at the package cache overflow issue specifically, we get:

SELECT date(TIMESTAMP) as date, dayofweek(TIMESTAMP) as dayofweek, time(TIMESTAMP) as time, LEVEL as SEV, MSGNUM, substr(MSG,1,50) as MSG_trunc FROM TABLE (PD_GET_DIAG_HIST( 'MAIN', '', '', Current timestamp - 30 days, current timestamp) ) AS T where MSGNUM=4500 ORDER BY TIMESTAMP DESC

DATE       DAYOFWEEK   TIME     SEV MSGNUM      MSG_TRUNC
---------- ----------- -------- --- ----------- --------------------------------------------------
06/22/2013           7 00:06:11 W          4500 ADM4500W  A package cache overflow condition has o
06/22/2013           7 00:06:11 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:07:26 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:07:26 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:06:34 W          4500 ADM4500W  A package cache overflow condition has o
06/15/2013           7 00:06:34 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:23:00 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:23:00 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:21:47 W          4500 ADM4500W  A package cache overflow condition has o
06/08/2013           7 00:21:47 W          4500 ADM4500W  A package cache overflow condition has o
06/01/2013           7 00:28:23 W          4500 ADM4500W  A package cache overflow condition has o
06/01/2013           7 00:28:23 W          4500 ADM4500W  A package cache overflow condition has o

  12 record(s) selected.

This is very similar output, but exactly twice as many rows. I suspect it is pulling the error both from the db2diag.log and the admin notify log. It would be important to be aware of this double reporting.

Here’s something that it might be interesting to do with this one – if you’ve got HADR and want to just see what it has been up to:

SELECT TIMESTAMP, substr(EDUNAME,1,12) as EDU_trunc, substr(MSG,1,70) as MSG_trunc FROM TABLE (PD_GET_DIAG_HIST( 'MAIN', '', '', Current timestamp - 30 days, current timestamp) ) AS T where EDUNAME='db2hadrp' ORDER BY TIMESTAMP DESC

TIMESTAMP                  EDU_TRUNC    MSG_TRUNC
-------------------------- ------------ ----------------------------------------------------------------------
2013-06-06-01.11.33.026965 db2hadrp     -
2013-06-06-01.11.33.025066 db2hadrp     -
2013-06-06-01.11.32.927747 db2hadrp     near peer catchup starts at LSO 293782474
2013-06-06-01.11.32.927466 db2hadrp     remote catchup starts at LSO 293782474
2013-06-06-01.11.32.925035 db2hadrp     -
2013-06-06-01.11.29.820828 db2hadrp     Primary received a pair validation request from standby - waiting for
2013-06-06-01.11.29.723904 db2hadrp     Info: HADR Socket receive buffer size adjusted to, SO_RCVBUF: 87380
2013-06-06-01.11.29.723710 db2hadrp     Info: HADR Socket send buffer size adjusted to, SO_SNDBUF: 16384 byte
2013-06-06-01.11.29.716866 db2hadrp     -
2013-06-06-01.11.29.616835 db2hadrp     TCP socket connection accepted.  Remote Host: dummy_ip Port: 32939
2013-06-06-01.11.28.918451 db2hadrp     Info: HADR Socket receive buffer size, SO_RCVBUF: 87380 bytes
2013-06-06-01.11.28.918330 db2hadrp     Info: HADR Socket send buffer size, SO_SNDBUF: 16384 bytes
2013-06-06-01.11.28.918179 db2hadrp     Info: Primary Started.
2013-06-06-01.11.28.916627 db2hadrp     -
2013-06-06-01.11.28.913433 db2hadrp     HADR is using IPv4.
2013-06-06-01.11.28.913308 db2hadrp     HADR_REMOTE_HOST dummy_host mapped to dummy_ip
2013-06-06-01.11.28.913118 db2hadrp     HADR_LOCAL_HOST dummy_host mapped to dummy_ip
2013-06-06-01.11.28.911997 db2hadrp     -
2013-06-06-01.10.05.143127 db2hadrp     -
2013-06-06-01.10.05.140294 db2hadrp     -
2013-06-06-01.10.03.641375 db2hadrp     near peer catchup starts at LSO 7719944001
2013-06-06-01.10.03.293124 db2hadrp     remote catchup starts at LSO 7692106324
2013-06-06-01.10.03.291450 db2hadrp     -
2013-06-06-01.10.03.094248 db2hadrp     Info: HADR Socket receive buffer size adjusted to, SO_RCVBUF: 87380
...

Summary

I realize this must not be new information to everyone, but if I missed these somehow, it’s a sure bet that other DBAs out there did too. Thanks to Fitz and Ian for bringing these to my attention!

You may also like...

4 Responses

  1. Murthy says:

    The info you provide through this website is really helpfull for new DBAs. I always like to spend my time with this site. It’s really nice way to get the clear picture of diaglog error msgs.
    Thanks

  1. August 12, 2014

    […] tend to default to just taking a quick look at the db2diagnostic log for simple looks, and then accessing the db2 diagnostic log with sql for more complicated analysis. I only use the db2diag tool for archiving really – SQL access […]

Leave a Reply

Your email address will not be published. Required fields are marked *