Backup Compression Consideration – DB2 COMPRESS vs. GZIP

Krafick_HeadshotWe interrupt your regular programming for a quick announcement.

If you have been reading DB2commerce.com for a while you know I am an occasional guest writer on this technical blog. I’ve worked this year on expanding my comfort zone with regular articles here, a developerWorks article to be posted next week, and I have entered as a competitor on the DB2 Night Show’s “DB2s Got Talent”.

Last Friday, I presented a four minute presentation n DB2 v10.1 Column Masking and produced a follow-up article. I am now asking for your votes. You can vote on me based off your knowledge of my work here, you can vote based off my presentation at 38:11 in the replay, or watch the whole episode. Regardless, I am asking you vote for me.  To do so:

  1. Go to the Db2 Night Show Replay Page
  2. Select Episode #128
  3. Watch the replay, or scroll to the bottom and fill out the survey and vote for me. (You could win an Amazon Gift Card)
    You will need a second vote – I suggest Ken, Rob, Prasad, or Ian.

Thank you for your time. We now return you to your regularly scheduled article.


 

I’ve been using the “BACKUP … COMPRESS” statement as long as I can remember.

However, I ran into an unusual situation at my shop. In our environment there is a fine line between backup file size and backup speed.  Matter of fact, at this location, speed is more important. The backup could be 3x larger if the total backup time is ½ as long.

My company had adopted taking a BACKUP without the compress option and gzip’ing the file after the fact. To me this seemed to be a lot of extra work so I set up a simple test to see which is more efficient – Gzip or DB2 Compress?

Results of testing a 538G database backup to disk:

Backup with Compression

Backup w/o Compression

Time of Backup

Time to Gzip

Total Time

Compr. Size

Reduction

DB2 Compress

84.8 G

N/A

2H 39M

0

2H 39M

84.8 G

84%

Gzip

N/A

538 G

42M

3HR 54M

4HR 36M

48 G

90%

 

Some items to note:

  • Machine:  P7, 4 Core, 112 Gig Memory on AIX 6.1 FP7; DB2 v9.7 FP7
    Your results may vary based on machine configuration.
  • If a compressed backup is needed, and time is the primary factor – DB2 COMPRESS
    DB2 Compression is almost 2x as fast as backing up and then compressing.
  • If a compressed backup is needed, and file size is the primary factor – GZIP
    GZIP has a narrow lead on DB2 compression at 6% improvement, but the file size is almost half that of DB2 compression.
  • Although not specifically tested in this scenario, your restore time could be affected as well. As an example, you would need to incorporate time to Gunzip a file for restore.

What is the conclusion? Both are efficient, but a decision depends on what your needs are – speed or backup size. Need a quick offline backup before a production change at 3am? I would use DB2 compress or don’t compress at all. Need to cram 3 backups onto one file system? You may consider using gzip to get as much space as possible.

I should mention that a spirited discussion came up at IDUG last year. Should you compress a backup on a compressed database that utilizes value compression and/or deep compression? Some argued it’s a waste of time and doesn’t gain anything while some argued it does. IBM specifically addresses this in v9.7 Info center …

“Whereas row compression works on a table-by-table basis, when you use compression for your backups, all of the data in the backup image is compressed, including catalog tables, index objects, LOB objects, auxiliary database files and database meta-data.”

My response would be, “For the most part – yes, there is benefit to a compressed backup of a database that has compression enabled”.  However, if you have a database using value compression, deep (row) compression, who has compressed indexes – the benefit may not be worth the effort.


Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick

Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Why won’t you just die?! (Cleaning DB2 Process in Memory)
Attack of the Blob: Blobs in a Transaction Processing Environment
Automatic Storage Tablespaces (AST): Compare and Contrast to DMS
DB2 v10.1 Column Masking

You may also like...

5 Responses

  1. Chris Aldrich says:

    Interesting. I wonder if they saw the GZIP as “faster” since the regular backup completed in 42M as opposed to 2H 39M for a DB2 compressed backup, even if the overall process was slower.

    And to add to the compressed backup over compressed tables discussion….I think the best answer is to try it yourself. The “WAREHOUS” database that IBM’s Tivoli Monitoring uses can get rather large. So can several of its tables. We have several tables compressed within the database, and I still compress the backup. In this particular case I tested it both ways and the compressed backup still won out on disk space.

    But that may not be true in all cases. As has often been my experience with compression in other areas, compressing compressed things tends to explode the size. But it may depend on what is compressed and with what algorithm, etc.
    So again, experimentation is the key to discovering what is right for you.

  2. Interesting. You must take into account where the ‘pain’ of time&cpu is suffered. Does the gzip process take place on the database server itself? Consuming CPU-cycles which should be used to handle all those nice SQL’s?

    Or, is the backup image stored on a file server and that file server is handling the gzip-workload? In that case DB2 has finished the backup and does not care about back-end processes taking place somewhere else.
    In case of a restore: just copy the zipped-image to the database server and unzip it there. All that CPU power is idle because do not have a database running at that time 🙂

  3. Hi Michael,
    Great post. I like the detail. I was looking into this stuff sometime back, but didn’t have enough resources (disk space and big enough DB). I went through several forums and realised that backup with compression is better if you plan to use it for restore in the future. The logic is, as in your case, backup with DB2 compression is 84.8 GB and without it is 538GB, when we try to restore these backup, in the first case DB2 will have to read only 84.8GB of data from the disk as compared to 538GB in the second case, thereby saving on some costly I/O cycles. Wondering if it intrigues you to go and test it yourself…:). I request you to please check it if you have the capacity. Would love to look at the results.

    • Yogesh says:

      Hi Saurabh

      Did You ever got Your answer ?

      I am also have been doing some reserch and wondering how to reduce the time to to offline backup of 600 GB of DB2 and again restore it back faster as and when needed to

      Yogesh

  4. Andrew McLauchlan says:

    We’ve started using pigz as opposed to gzip. It does consume massive CPU but the elapsed time is a fraction. Of course the unzip is still the same time.

Leave a Reply

Your email address will not be published. Required fields are marked *