Preventing Log Saturation

Ian_Bjorhovde
A few days ago, I was involved in a conversation on Twitter with Ember and Erwin Hattingh, a DB2 DBA from the Netherlands whom I met at IDUG in Dublin last November.

Ember expressed her surprise at how well DB2 handled a nasty situation:

However, Erwin complained about DB2 not dealing with log file saturation particularly well:

If you’re not familiar with this situation, log file saturation occurs when there is an active transaction (one that has not issued a commit or rollback), keeping an old transaction log active. DB2 can not archive active transaction logs (because they are required for crash recovery), so an old, uncommitted transaction can lead to a transaction log full error (SQL0964C) as other transactions use transaction log space.

log-file-saturation

In this illustration, you can see that there is a very small amount of log space this is used by active transactions (shown in orange); however, because the current LSN has nearly reached the end of the last transaction log file, log saturation is about to occur.

It is fairly easy to demonstrate log saturation, with only 2 applications connecting to a database. The first application will execute a single small statement, but will not commit, thereby keeping the transaction active:

idbjorh@dpx-ubuntu14:~$ db2 +c "delete from reports"
DB20000I  The SQL command completed successfully.

Next, we will start another “application” that will perform many individual transactions. For this example, I have started an infinite loop that inserts rows into a table (the goal of this is simply to fill the transaction logs):

db2inst1@dpx-ubuntu14:~$ while [ 1 ] ; do  
> db2 "insert into ch2 select * from ch"
> done
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
...
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.

DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL0964C  The transaction log for the database is full.  SQLSTATE=57011

As you can see, even though auto-commit was enabled (each insert statement was a separate, small transaction), the insert statement eventually failed when DB2 returned the SQL0964C error. There is no indication that this was actually caused by log file saturation, so you must look at the db2diag.log for evidence. Here is an example of the 2 records you’ll see:

2016-01-20-21.09.15.944061-420 E1078331E629          LEVEL: Error
PID     : 12600                TID : 140106224625408 PROC : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : SAMPLE
APPHDL  : 0-149                APPID: *LOCAL.db2inst1.160121040537
AUTHID  : DB2INST1             HOSTNAME: dpx-ubuntu14
EDUID   : 19                   EDUNAME: db2agent (SAMPLE)
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860
MESSAGE : ADM1823E  The active log is full and is held by application handle 
          "0-156".  Terminate this application by COMMIT, ROLLBACK or FORCE 
          APPLICATION.

2016-01-20-21.09.15.944336-420 E1078961E593          LEVEL: Error
PID     : 12600                TID : 140106224625408 PROC : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : SAMPLE
APPHDL  : 0-149                APPID: *LOCAL.db2inst1.160121040537
AUTHID  : DB2INST1             HOSTNAME: dpx-ubuntu14
EDUID   : 19                   EDUNAME: db2agent (SAMPLE)
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:6666
MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE
          "Log File has reached its saturation point"
          DIA8309C Log file was full.

The first message identifies the application that has the long-running transaction, and the second message highlights the fact that log saturation has occurred.

Tools, Please

DB2 provides tools to deal with (or prevent) this situation from resulting in a call in the middle of the night. Ember mentioned that she will use the DB2 governor, by setting up a rule that forces any connections that have been idle for 4 hours. Although this works, you need to be careful when setting the time threshold to ensure that you account for transaction volume – how quickly DB2 is moving through transaction logs. In my example above, the transaction rate was high enough that I was able to cause log file saturation to occur in about 15 seconds!

It’s also fairly easy to imagine a script that would monitor transaction log utilization and look for old, idle transactions and kill them. This could probably resolve most cases, but I am a lazy DBA and do not want to write and test another script.

Fortunately, there is another option: IBM added a database configuration parameter called NUM_LOG_SPAN in DB2 9.7. This parameter sets a limit on how many transaction log files any single transaction can span; it does this by comparing the first LSN for a transaction with the current LSN of the database. If the difference between these exceeds the number of log files specified in NUM_LOG_SPAN, then the offending connection is automatically forced off of the database. By setting this value to a number smaller than the sum of LOGPRIMARY and LOGSECOND, you can be sure that DB2 will not allow log file saturation to occur.

Let’s set NUM_LOG_SPAN (it can be updated online) and then re-run the same test from earlier.

db2inst1@dpx-ubuntu14:~$ db2 connect to sample

   Database Connection Information

 Database server        = DB2/LINUXX8664 10.5.5
 SQL authorization ID   = DB2INST1
 Local database alias   = SAMPLE

db2inst1@dpx-ubuntu14:~$ db2 get db cfg | egrep -i 'logprimary|logsecond'
 Number of primary log files                (LOGPRIMARY) = 3
 Number of secondary log files               (LOGSECOND) = 8

For this sample database, there are a maximum of 11 log files (3 primary + 8 secondary), so let’s set NUM_LOG_SPAN to 10:

db2inst1@dpx-ubuntu14:~$ db2 "update db cfg using num_log_span 10"
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.

Now, retry the same 2 applications and see what happens. First, the long running transaction:

idbjorh@dpx-ubuntu14:~$ db2 +c "delete from reports"
DB20000I  The SQL command completed successfully.

Next, I’ll start the other “application” that inserts records endlessly:

db2inst1@dpx-ubuntu14:~$ while [ 1 ] ; do  
> db2 "insert into ch2 select * from ch"
> done
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.

... wait for some time ...

DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.
DB20000I  The SQL command completed successfully.

Perfect! The application never fails. (Well, at least not until the tablespace runs out of space!)

However, if we go back to the first application and try to perform an action:

idbjorh@dpx-ubuntu14:~$ db2 commit
SQL1224N  The database manager is not able to accept new requests, has 
terminated all requests in progress, or has terminated the specified request 
because of an error or a forced interrupt.  SQLSTATE=55032

The ROLLBACK statement fails, because DB2 has forced this application off of the database.

If we look at the db2diag.log, we can see that DB2 has logged this event:

idbjorh@dpx-ubuntu14:~$ db2diag -H 5m -g function:=sqlpScanTran
2016-01-20-18.54.30.621096-420 E506785E733           LEVEL: Error
PID     : 5470                 TID : 139829459281664 PROC : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : SAMPLE
HOSTNAME: dpx-ubuntu14
EDUID   : 22                   EDUNAME: db2loggr (SAMPLE)
FUNCTION: DB2 UDB, data protection services, sqlpScanTranTableForLowTran, probe:550
MESSAGE : ADM1541W  Application "db2bp" with application handle "0-24" and
          application id "*LOCAL.db2inst1.160121013207" executing under
          authentication id "IDBJORH " has been forced off of the database for
          violating database configuration parameter NUM_LOG_SPAN (current
          value "10"). The unit of work will be rolled back.

Conclusion

Log file saturation is often caused by poorly-behaved applications – applications that will allow a transaction to sit open for long periods of time. It’s easy to want to blame a user who starts to work on something and gets distracted and walks away from their computer, but I tend to believe that application developers should work to prevent this.

However, if this is not feasible, NUM_LOG_SPAN provides a powerful way to prevent log file saturation. However, please take care before enabling it. There is no indication to the application why it was forced off of the database, so you will want to work with your application’s owners to make sure that this is desired way to handle these kinds of problems.


 
Ian_Bjorhovde Ian Bjorhovde is the principal consultant for DataProxy LLC in Phoenix, Arizona. He has worked in the IT industry for over 18 years and has spent the last 15 years as a consultant developing and supporting large DB2 LUW databases with clients in many different industries. He is an IBM Gold Consultant and IBM Champion, produces a DB2-focused podcast called The Whole Package Cache and has presented at many RUG, IDUG and IBM conferences.

 


You may also like...

6 Responses

  1. Raul Baron says:

    Wow !! Excellent article !! I will put this in my to-do list immediately !!! Thanks for sharing !!

  2. Ananth says:

    Ian, Excellent blog.. I love that parameter that would be of great help. But i have a doubt in using db2gov as it sometimes consider backup as idle application and forces it aswell.. So i would use this parameter making things simple. Thanks again

  3. Wen says:

    Great idea! We had the similar situations of “log full” caused by some uncommitted transaction. This is an excellent way to handle the situation. Thanks!

  4. Luke Numrych says:

    This is a great tip!

  5. Robelis says:

    Great Tip…

    But, a little doubt.
    Suppose that exist two apps holding the logs, following your example, the DB2 will understands that each transaction can hold 10 logs or will share this number for both?
    Thanks…

    • @Robelis, It does not matter how many applications are connected and active, it’s always the application with the oldest transaction (i.e. the application with the current LowTranLSN) that will be killed if the difference between Current LSN and LowTranLSN exceeds NUM_LOG_SPAN.

      Only 1 application can have LowTranLSN at a time, so the application with the oldest transaction will be forced when NUM_LOG_SPAN is exceeded for its transaction. At this point, a different transaction will have LowTranLSN. If the Current LSN moves forward enough to cause this transaction to exceed NUM_LOG_SPAN, then it will be forced, too.

      Make sense?

Leave a Reply

Your email address will not be published. Required fields are marked *