How to determine your backup/recovery strategy
Determining your backup strategy for a Commerce DB2 database is much like determining your backup strategy for any other db2 database. There really aren’t any special considerations for Commerce that you wouldn’t have for another database.
By Default, Commerce databases use Circular logging. Circular logging is really not appropriate for OLTP databases. Circular logging means that you cannot take online backups (all database backups must be offline), and it means that you can only restore to the point in time when the offline backup was taken – there’s no possibility of rolling forward through transaction logs after a restore in order to get to the point in time right before the failure. Both of these are very contrary to the requirements of all OLTP databases, including Commerce databases.
Therefore, one of my build steps includes switching to archive logging. Archive logging allows online backups and supports roll-forward to any point in time for which you have transaction logs. The one drawback of archive logging is the need to manage retention of the transaction log files. I have scripts to handle that, so it’s not a big deal, but you’ll want to make sure you have this covered.
No matter what your logging method, make sure that anyone with access to the transaction log files knows where the Active log files are and never ever to delete or compress them. You will very quickly crash a DB2 database by removing an active transaction log file, and the only real way to recover is to restore. For this reason, even if you’re archiving to disk, make sure that archive logs are stored in a different location than active logs.
When determining your backup strategy, you have to start with your Recovery requirements. The biggest question is how long do you have once a problem occurs to recover the database? Often for a production system, you will have HADR or another High-Availability solution to manage most failures. But the question is still, what do you do if that High-Availability solution fails or if a failure occurs which your High-Availability solution does not protect against. (I’ve seen it at least once on production in the last year – worst case scenarios do happen).
I’m not going to go through every detail of developing a backup strategy here – though let me know if anyone has interest in that and is unable to find resources elsewhere, let me know and I might do a more complete post on it. I am going to hit some of the highlights.
DB2 does not support file-level backups of the database taken while the database is up. An you will pretty much never take a file-level backup of a DB2 database (I put this in for the non-DB2 DBAs), unless you’re doing some kind of flash copy with a SAN(I believe there’s a write suspend command for that). All of your backups will be taken using the db2 “backup” command. For DB2 LUW, you also do not use export/import to copy data as a backup/recovery strategy.
Most frequently for OLTP databases, you cannot take an outage, even on a weekend in the middle of the night, to backup your databases, so you’re most likely looking at online backups only.
I almost always do full database backups at least weekly. Sometimes even daily depending on the client/site. Some clients also use incremental backups – depending on how bad space constraints are. Generally speaking, the more frequently you do a backup, the shorter your restore time. Rolling forward through transaction logs is slow – the more activity on the database, the slower it is. Testing your restore strategy should give you a good idea of how long rollforward takes for your database. I don’t like delta backups because of the number of files you have to manage to make them work. I do make sure I have two full backups on disk in case one is bad.
I most frequently backup to disk and then that disk is later backed up to tape. Backups directly to TSM or other vendor tape solutions work well too. The same should be true for transaction logs as well, and DB2 has a nice facility for sending archived transaction logs directly to TSM. If you’re archiving transaction logs to disk and backing up to disk, make doubly and triply sure that disk is backed up to tape.
Also if you’ve got databases before 9.5, make 100% sure that you’re using the “include logs” keyword on your online backups. If you backup without the “include logs” keyword, then your online backups could be 100% useless.
Testing Your Strategy
So here’s where some people get lazy. They don’t test their strategy, not even once during build. In my opinion, once or twice a year, a client should call me up and say “hey, pretend the production database server just died”, and I should have to rebuild it somewhere else. With no notice or access to the real production server. If I can’t do it once, we need to work through what the problem is and fix it. If I can’t do it twice in a row, fire me (or whoever’s fault it is) and hire someone who better safeguards the data.
Most of your effort in this area, you spend on Production. But what about Stage? Or Dev? or QA? or whatever other non-production environments you have?
I do way more restores in dev than I do in production. And while the restore time is not as much of an issue, I’m a lot more likely to get a request to restore to say “Last Thursday at 10 AM”. Production, it’s pretty rare to get a request to restore to anything older than 24 hours. So Archive Logging should be on there, too (keep all of your Commerce environments as much in sync on configuration as possible). I like to keep two weeks worth of backups/logs on disk when possible.
For the non-production environments there’s also an education issue with the development team. I can’t tell you how often someone says “When was the last backup?” – that’s the wrong question. It doesn’t much matter when the last backup was, just tell me what date/time you want me to restore to, and I’ll tell you if I can do it or not. 99% of the time if it’s in the last two weeks, the answer is yes.
Retaining Backup Images
So I have yet to jump on the bandwagon for the new setting(s?) that make it so that db2 manages backup retention. Like managing transaction log retention, I still do it manually with scripts. In any case, you need to be aware of how long you’re keeping backup images – often the easiest thing to do is to code the retention into the script you use to take backups. I code my scripts so that it is not only a date decision on deleting a backup file, but so that at least 1 (sometimes 2) images are always retained. That way if backups are failing and you’re on vacation or something, you don’t end up deleting ALL of your backup images.
So part of the backup/recovery strategy is also to know when it is appropriate to take ad-hoc, non-scheduled backups. If I am doing something that changes a bunch of data (as in loading the catalog or significant dbclean), then I’ll take a backup before and after. If there’s a FixPack at the DB2 or Commerce level or a Commerce feature pack, then I’ll also take a backup. When in doubt back it up. One of the worst feelings in the world is to need a backup image and not have one.
Overall, it is important to understand the nuances of each decision you make in designing a backup and recovery strategy. The strategy you have for one database or client may not apply to other databases or clients.