Attack of the Blob! (Blobs in a Transaction Processing environment)
[This article is based off the “Attack of the Blob” DB2 Night Show webcast held on Friday April 26, 2013 at 11am EST.]
I would love to say that the biggest technical challenge of my career was some massive data warehouse or an OLTP database with an insane number of transactions per second. Heck, I’ve even had the distinction of working on the second largest DB2 database in the world (at that time).
No, the biggest technical challenge of my career was a three week ordeal on a small 35G database centered on a 5k row table that had data purged nightly because the data wasn’t important the next day. This was also a case where power didn’t mean speed—we were on an oversized server (4 CPU, 1.2 T Disk, 30G Memory) built for massive growth. I had enough juice to light up Atlanta, but really only needed to power a flashlight.
This table, with the help of other mistuned areas, would snarl the database so badly that the whole application would grind to a halt. This would cause 1600 locations and 10k+ teammates to light up our technical support lines like a Christmas tree.
As much as I would love to blame the SAN, or networking, or anyone other than me, the problem was in my back yard. How I discovered the issue is an article in itself. However, I should give a quick nod to Scott Hayes and DBI. My performance analysis method came from Scott’s “DB2 for LUW Performance Workshop” that I took as a pre-conference seminar at the IDUG Conference in 2012. He has a few blogs on the topic located on the DBI website under “Performance How To’s” . A good place to start are his “Elephants and Mosquitoes” entries.
It ultimately came down to a few areas:
- The database architecture was database agnostic. It wasn’t coded for DB2, or Oracle, or SQL Server. So DB2 features to improve speed were not implemented.
- When the DB was stood up years ago, we dumped everything into the same tablespaces and bufferpools. That’s right, the “everyone in the pool” mentality. We did not tune for hot tables or important areas. In other words, we shot ourselves in the foot.
- Blobs. These things are evil. I had never really tuned for this data type. I had to learn how to quickly tame this beast.
For this article, I am going to focus on handling BLOBS. There was some basic Database Tuning 101 applied—implementing DB2 Environment settings for OLTP best practice, identifying hot tables and breaking them out to their own tablespaces and bufferpools, and tweaking the LOGBUFSZ settings—but the largest lift came from taming the evil BLOB that was slowing down my database.
What is a Blob … besides big, ugly, and slow?
BLOB is short for Binary Large Object which can be many things—for example, Documents, Voice Data, Pictures, or mixed media.
What makes BLOBs unique is that BLOB data is not stored in a row. It’s stored externally to the table with a pointer giving the location of the object. See the figure below:
Blobs are not stored within a row because it can be impractical, sometimes impossible, to fit the BLOB in row without blowing out row length limits (32k). See below for an example of why BLOBS are not in row natively. [Note: Both graphics taken directly from IBM Info Center]
So, why do we care if we are not storing data in row? BLOB data types are a concern to DBA’s because they spin disk. If data is not stored within a row, it can’t be brought up into memory (bufferpool), which means I/O overhead. I/O overhead, spinning disk, physical reads—they all are slow compared to reading in memory. Is this a big deal? Not always. For a data warehouse, where a query can run minutes or hours, this isn’t as crippling as it sounds; but from a transaction processing point of view, where we measure things in milliseconds, time to spin disk can be an eternity.
Symptoms—How did I/O affect the Database?
To give you an example of how blobs affect I/O, let’s take a look at some metrics from my database in a 20 minute timeframe.
Direct reads = 233754 Direct writes = 174318 (8715 Minute) Direct write elapsed time (ms) = 24593 Update/Insert/Delete statements executed = 93176 Log pages written = 88708 Number write log IOs = 78827
Metrics are all about perspective. Is writing to disk 174k+ times in 20 minutes excessive? Without data to compare to, it’s hard to tell. But common sense tells us we can assume at the very least the application is a little chatty when it comes to direct writes.
The other piece of the puzzle is disk utilization. When the DB was humming, data disks would go to 30-40% utilized while DB2 Log disk ran at 50-60%. When the situation started to get a little dicey, was when data disks were up and holding anywhere between 60-80% and DB2 Log disks were at 90%+. Data was moving, and moving at a good clip, but the response time wasn’t fast enough for the application.
Why was the DB2 Log disk consistently higher? Blobs log heavily by nature. So not only are you off spinning disk to access the blob, but you are putting pressure on your logging disk as well.
It seems a lost cause, doesn’t it? Blobs by nature spin disk, the application accesses this table on almost every transaction, and we can’t get DB2 to lift this into memory. It seems as if you are between a rock and a hard place.
What was the vendor’s solution? Isolate the table into its own tablespace and its own LUN of disk. Better yet, if you are in real pain, make that a solid state disk. This is what other clients have had to do in the past, and to be honest, at the time, this seemed like the only solution. In the end, it turned out to be “Plan B”—Let’s try a Hail Mary Pass and see what turns up.
Domesticating your Blob
Stop the Chit Chat
If your DB went down, do you really need to recover the data from that BLOB? Well, in our case, that was a resounding no. Not only did we not need the data, the table was purged each night for performance reasons.
So let me introduce the NOT LOGGED clause:
For example: CREATE TABLE EMPLOYEE (EMPNO CHAR(6) NOT NULL PRIMARY KEY, FIRSTNME VARCHAR(12) NOT NULL, MIDINIT CHAR(1) NOT NULL WITH DEFAULT, LASTNAME VARCHAR(15) NOT NULL, WORKDEPT CHAR(3), PHONENO CHAR(4), PHOTO BLOB(10M) NOT NULL NOT LOGGED) IN RESOURCE
So what happens in a recovery scenario? Well, if you RESTORE and ROLLFORWARD, the blob data will be replaced by binary zeros.
That isn’t to say that there is no record of what is going on. DB2 uses “shadowing,” a recovery technique where current storage pages are never overwritten. Old unmodified pages are kept as “shadow copies” that are discarded when they are no longer needed for rollback. So you lose recovery, but not the ability to rollback a transaction stopped in mid-flight.
[As a side note: Rule of thumb is to use this feature on LOBS over 10MB. LOBS over 1G can’t be logged.]
This feature by itself dropped DB2LOG disk utilization by 10%. This bought us enough time to focus on implementing best practices and tackling the larger problem of how the database was using blobs.
Break Rule Number 1
Remember what I said about DB2 not being able to keep blobs in a row? It was impractical and sometimes impossible?
I lied. Well, not lied as much as didn’t tell all the truth.
Natively, lobs are not held in row. I mentioned this is because blobs are usually much bigger than 32k and would blow out maximum row size; but this isn’t always the case. If your blob can fit into the row, you can explicitly tell DB2 to treat it that way using DB2’s INLINE feature.
What is the benefit? Once the blob is held in row, it CAN be pulled into bufferpool and be accessed in memory. You remove the need for physical reads. Don’t forget, the row (including blob) is now eligible for row compression!
To do this you need three commands.
Use the ADMIN_EST_INLINE_LENGTH command to see if your blob can fit into a normal size row.
- (number): Size needed in row to be brought inline
- NULL: Inputs are NULL
- -1: Cannot be inlined due to length
- -2: Inserted before 9.7, can’t be evaluated
Alter the table using the INLINE command.
Command: ALTER TABLE (tablename) ALTER COLUMN (colname) SET INLINE LENGTH (Where value is max length from ADMIN_EST_INLINE_LENGTH evaluation)
Keep in mind, you will need to respect the max row length based off your tablespace settings. You will also need to add 4 for INLINE overhead.[Note: Graphic taken from IBM Info Center]
REORG the table using LONGLOB feature. For the INLINE feature to take effect, you must UPDATE the row or run a REORG using the LONGLOB parameter. You won’t need to do this every time as it is only used for the conversion process. There usually isn’t a real benefit unless there are XML columns in your table; in that case, some space may be reclaimed. In a normal REORG this would be time consuming and doesn’t improve clustering.
Finally, as you step back and admire your work, know that not all blobs will inline. In our specific case, close to 20-30% of our blobs could not be brought into rows. This is OK: operating 70%-80% of the time in memory is still a huge improvement. It is acceptable to spin disk 20-30% of the time, in our scenario.
[In our specific scenario, we had this table in its own tablespace and bufferpool after the redesign. Not only did we turn off FILESYSTEM CACHING (see below), the bufferpool was large enough to pull the bulk of the table in memory].
Put on Afterburners
Lifting your blobs into memory should give you an enormous lift. However, there is one more tuning tidbit you can implement: Know when and when not to use FILESYSTEM CACHING.
Using FILESYSTEM CACHING or NO FILESYSTEM CACHING commands allow you to enable/disable non-buffered I/O on specific tablespaces while avoiding dependency on physical layout. In an AIX environment, you can cause double buffering at a DB2 level when the O/S is already buffering for you.
When dealing with BLOBs natively, you WANT filesystem caching enabled. DB2 needs this buffering to improve performance.
When you do not have BLOBs, you DO NOT want filesystem caching enabled.
In our specific scenario, we had blob and non-blob tables mixed in the same tablespaces and bufferpools. This was a huge performance drag because only 15% of our 400+ tables needed filesystem caching. The other 85% were doing double work.
If you are thinking ahead and are able to separate your blobs into their own tablespaces and bufferpools you can turn on FILESYSTEM CACHING as needed for blob tables.
Usain Bolt got nothing on us.
After three weeks of research, analysis, testing, and redesign we were ready to go live and implement our new solution. Would the new approach address our needs or was I really going to need to speak to management and advocate for solid state drives?
Here is a peek of what we saw, comparing apples to apples:
Before After Direct reads = 233754 73471 68% Improvement Direct writes = 174318 14008 91% Improvement Direct write requests = 16213 262 98% Improvement Direct write elapsed time (ms) = 24593 1622 93% Improvement Log pages written = 88708 53642 Overall work drops Log write time (sec.ns) = 89 76 Overall work drops Number write log IOs = 78827 48354 Overall work drops
And what was the feedback from load testing?
- Total Transactions Processed increased by 9%
- 20% of Total Transactions improved by 15-16%
- Less than 1% (Quote “Very Very Few”) of transactions slowed down slightly
More detail from production:
- SQL exceeding 2 seconds on basic FETCH drops from thousands to less than 50
- 97.93% drop – Execution time for UPDATE statements per 100 executions against problem table (0.436 seconds to 0.009 seconds)
- 81.7% drop – Execution time for SELECT statements per 100 executions against problem table (0.656 seconds to 0.120 seconds)
- ZERO Lock Escalations per 1000 transactions
- Write Rate (how often we have to spin disk) – Drops 60 IOPS (per volume) to 18 (70% Improvement)
- Write Transfer Size – Increases from 8k to 18k per IO Operation (2.25x Improvement)
- Read Rate (how often we have to spin disk) – Drops from 15 IOPS (per volume) to 12.
- Read Transfer Size – Increases from 4.29 to 4.94 per IO Operation
- 10-20% drop in disk utilization
Every superhero needs a sidekick
I wasn’t kidding when I said that this was the biggest challenge of my career. Nor was I exaggerating in my DB2 Night Show webcast when I said I pulled in a small army to help. I may speak as an expert in this blog, but I am not an expert. An expert works almost independently—I solved this with a small army: Prashant Sogarwal, who hands down has to be one of the top DBA’s on the east coast, helped design the solution and acted as a mentor. Stephanie Knight-Baker spent hours in data analysis and solution coding. Scott Hayes and his performance and tuning techniques helped narrow down where the problem was; Scott’s expertise in tuning was also invaluable. Ember was always available to bounce ideas off of and encouraged me more than once.
This leads me to a point I want to make as a parting thought. Where would I have been without my friends, teammates, and contacts? All of these contacts were made through my volunteerism through the DB2 User Group—IDUG (International DB2 User Group). My analysis techniques came from conference sessions and my network of help came from other members.
You can’t save the day on your own. You can’t stand up in front of an executive, with an “S” on your chest and the wind at your back, without a team to support you. So I encourage you to not only apply the techniques and analysis in this article, but to go out and actively seek out that small team you go to in an emergency. Because in the end it will be your peers, friends, and mentors that pull you out of the fire. Or in this case, beat back the giant blob trying to eat your database one I/O at a time.
Further Reading from IBM Info Center:
- Large objects (LOBs)
- Large object (LOB) column considerations
- Inline LOBs improve performance
- ADMIN_EST_INLINE_LENGTH function – Estimate length required to inline data
- Storing LOBs inline in table rows – IBM DB2 9.7 for Linux, UNIX, and Windows
- Table spaces without file system caching
Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick
Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Why won’t you just die?! (Cleaning DB2 Process in Memory)