Speeding up DB2 Backups
It’s a question I hear frequently – How can I speed up backups? I thought I’d share some details on doing so.
Any backup cannot be faster than it takes to back up the largest tablespace. Parallelism is essentially done on the tablespace level – meaning one thread per tablespace. That means that if you have the majority of the data in a database in a single tablespace, then your backups will be slower than if you had data evenly spread across a number of smaller tablespaces.
Obviously there are a lot of factors that go into tablespace design. But one of the big ones I now take into consideration is backup speed.
WebSphere Commerce is a vended database that by default places all data into one or two tablespaces. This makes things “easy”, but it also makes it so many non-altered WCS databases could have backups that run much faster. Some vendors will not let you change tablespace layout, but WCS does, so if you’re looking at a significantly sized WCS implementation, you will likely want to customize your tablespace layout – for this and other reasons. Smaller WCS implementations it won’t matter as much for.
Other vendors have different rules on whether you can customize your tablespace layout at all. If it’s a custom database, you should certainly not just plop everything in USERSPACE1, especially for databases of 50 GB or greater.
If you have a database that is currently focused on a single tablespace, you can move tables online from tablespace using ADMIN_MOVE_TABLE. Due to RI restrictions, you’ll want to be on 10.1, fixpack 2 or higher to make this reasonable. Particularly high-activity tables may be more problematic to get the final lock on for the move.
When a backup starts, DB2 allocates half of the remaining utility heap for the backup operation (this is where the buffers are allocated). This means that you not only must have at least twice the amount of memory that a backup needs allocated to the utility heap, you must also be aware of the usage of the utility heap by other utilities – LOAD, REDISTRIBUTE, etc. See my entry on the utility heap for more information.
But how do you determine the memory backup needs? See the section on tuning the number and size of backup buffers below for details both on seeing what values DB2 chooses and some thoughts on calculating these for yourself.
Tuning Backup Parameters
The traditional wisdom says to let DB2 choose the values for paralellism and the number and size of backup buffers. While that’s good advice 95% of the time, it’s still good to have some ideas and tools to identify and deal with the edge cases where you may want to tune them manually.
Seeing What DB2 Chooses
You can parse the DB2 diagnostic log to see what DB2 has chosen for past backups. The value of this data depends on how much data you have in your diag log and how many backups have been taken with your current memory/disk/cpu configuration. A command like this will show you what DB2 has chosen in the past:
10.1 and earlier:
grep "Autonomic BAR - using parallelism .*" ~/sqllib/db2dump/db2diag.log
grep "Autonomic backup/restore - using parallelism =" ./db2diag.log
The above assumes a default location for the DB2 diagnostic log on a Linux/UNIX system. Here’s an example of the output you might use:
$ grep "Autonomic backup/restore - using parallelism =" ./db2diag.log Autonomic backup/restore - using parallelism = 2. Autonomic backup/restore - using parallelism = 2. Autonomic backup/restore - using parallelism = 5. Autonomic backup/restore - using parallelism = 5. Autonomic backup/restore - using parallelism = 10. Autonomic backup/restore - using parallelism = 10. Autonomic backup/restore - using parallelism = 10. Autonomic backup/restore - using parallelism = 10.
In this case, DB2 is choosing different levels of parallelism at different times, but it settles down to most of the time at 5 or 10, which is reasonable given the size and layout of this database.
Thoughts on Manual Tuning
If you’re going to try manual tuning here, the two main things to consider are the number of tablespaces of a reasonable size (since each tablespace can only be addressed by a single thread), and the number of CPUs on the server. As an example, I was recently dealing with a database that had about 50 tablespaces, with about 40 or so of them having significant amounts of data. The server the backup was being taken on had 64 CPUs, and this database was the only real load on the server. For purposes of this backup, I didn’t really care about leaving much overhead for other things (backup was offline). For that environment, I would choose a parallelism of 40. If it were an online backup, I would likely have chosen a lower number based on the other load I saw on the server.
Number and Size of Buffers
Again, DB2 often makes the best choice for you, and it’s rare you’ll have to try to do anything manually. You’re more likely to slow the backup down than speed it up by giving it manual values instead of letting it choose the optimum.
Seeing What DB2 Chooses
Again, you can parse the DB2 diagnostic log to see what DB2 is choosing. This grep command works well:
grep "Using buffer size = .*, number = .*" ./db2diag.log
It produces output like this:
$ grep "Using buffer size = .*, number = .*" ./db2diag.log Using buffer size = 4097, number = 2. Using buffer size = 4097, number = 2. Using buffer size = 3297, number = 10. Using buffer size = 3297, number = 10. Using buffer size = 4097, number = 10. Using buffer size = 4097, number = 10. Using buffer size = 4097, number = 10. Using buffer size = 4097, number = 10.
Thoughts on Manual Tuning
If you are manually selecting values, the buffer size should be a multiple of the extent size of the largest tablespace. I like 4096 – nice round number if I don’t have anything else to start with. You generally want two buffers per thread (from the parallelism, above), and maybe an extra two for good measure. So using the system details from above – that database had a really small extent size of 2 for nearly every tablespace. I chose 82 buffers of size 4096.
One way to speed up a backup is to throw hardware at it, of course. If backup speed is important, SSD or an SSD cache may be useful. Though if you’re going for overall performance impact, limited SSD resources may better be spent on active transaction log space. Still, when you can get it, pure SSD sure is fun. I have 220 GB Windows database that is on pure SSD, having separate SSD drive arrays for data and for backup. It has ample memory and CPU too, and I can back that sucker up in 20 minutes.
For recoverability reasons, your backups should be on separate disk from your data and transaction logs, and it’s even better if they’re externalized immediately using a tool like TSM or some other storage library with a high-speed cache. You have to be careful to ensure that your network to such a location is super fast so it doesn’t become a bottleneck. I’ve seen ridiculously slow backups caused by the fact that the TSM server was in a data center two (western) states away. If you’re backing up to a vendor device like that, talk to the admin for it to find out how many sessions you can create against the device at a time – you can specify that in the db2 backup command. The more, the faster, but some implementations may not be able to support many.
Many larger implementation have the luxury of this sort of thing, but my small and mid-range clients simply back up to disk and then have a process that comes along and externalizes anything on that disk to slower off-site storage.
Backup Memory Usage
For larger databases, and especially those with memory pressure, you can test the use of the DB2 registry parameter DB2_BACKUP_USE_DIO. I’ve heard of some good experiences with it, but the little testing I’ve done with it on a smaller database hasn’t shown much difference. Test it thoroughly before relying on it.
What it does is for the write portion of the backup, it disables OS-level filesystem caching. On one hand this makes sense – DB2 is not going to ever read the data written to the backup image, so why cache it? On the other hand, writes to cache may be much faster than directly to disk. If your SAN has a cache, your disk write speed might support use of this.
Another thought on memory usage and backups – a backup will read every page in your database into the bufferpool before writing it out to the backup image. Thus if you have primed bufferpools and larger bufferpools, this means that DB2 has less to read into memory. If you have an over-abundance of memory, push DB2 to have larger bufferpools so there is less to read in.
The one os-level parameter I’ve heard of making a difference is maxreqs on AIX. Check out this article on maxreqs and backups on aix. Essentially, you want to make sure that maxreqs is at least 256 times the number of threads (as determined by the value of parallelism for the backup).
Well, there’s my brain-dump on backup tuning. It mostly boils down to:
- Split data into multiple tablespaces
- Make sure util_heap_sz is large enough
- Use fast backup locations
- Trust the values DB2 picks up for backup parameters