DB2 Databases – the Basics of Storage
So many times it appears that a database expert is not engaged with the Commerce Architecture. This generally means that your Commerce database is on the root filesystem (eek!) and using circular logging. There’s more than that you should change on build, don’t get me wrong. But storage is obviously a biggie whenever you’re dealing with a database. I’m going to talk through both what I would want for a large, high-performance database and also what a real-world decent starting point looks like.
So if cost is not a major issue (ha!) and you care most about making your database as fast as possible, I would want to see five separate I/O paths to a well-architected SAN with the biggest caches you can get. The purposes of those four completely separate paths would be:
- Data – tablespaces holding database data
- Indexes – tablespaces holding index data (some work needed for this)
- Temp – tablespaces holding temporary data
- Logs – database transaction logs
- Backups (or a alternately high-speed connection to take backups directly to a TSM server)
This perfect world keeps the different kinds of database data separate and ensures that read and write operations that are likely to happen at the same time get the maximum amount of disk parallelism. I include backups because this is a time when the entire database is read from disk and essentially written to disk, so you can significantly lower your backup and restore times by having those separate. If you are lucky enough to have a geographically close TSM server with high-speed links, then I do prefer that, though I’ve seen some bad TSM implementations in my career. Often backup to disk is enough faster that backups should be taken to disk and moved to TSM or other separate storage solution asynchronously.
Another point here is that you’re going to have to tweak some things to get everything in the proper location. This means altering the scripts that Commerce uses to create the database on build, or doing some fancy restoring (or worst case, export/import) after build to get everything where you want it to go. Dividing the data and indexes up after you have been running on a default database for a while is not usually feasible because it does require export/import (restores can’t change your tablespace type from SMS to DMS or your table definitions to specify location of indexes), which can be extremely time consuming and is rarely an option for databases over 25 GB in size, even with the help of db2move.
The Real World
Ok, so in the real world, we don’t always get the hardware we want, and we often have to consider that there may not always be a full-time db2 DBA who knows how to monitor and administer DMS tablespaces(though auto extend features in DB2 9 kinda make that last point mute, though you then have to make sure the autoextend features are set up properly). In the real world, we generally get either a single RAID Array or a single path into a SAN(please go with a minimally reliable SAN – I’ve fought with some poor ones, and it really is not fun and requires a high level of DB2 expertise).
So in that case, it still makes sense to break out some of the data for administrative purposes into different filesystems – even if you don’t have the ability to do whole separate I/O paths. Assuming a single path to whatever kind of highly-redundant and available I/O, I like to have filesystems for the following:
- database data – including data, indexes, temp – speed matters here
- active transaction logs – speed matters here
- archive transaction logs – speed is not as critical
- instance home – speed is not as critical
- diagnostic log path – speed is not as critical
- administrative data and scripts – speed is not as critical, and in a pinch, this can be combined with the instance home.
I’ve noted above where speed is the most critical in case you have some higher speed storage and some lower speed. Most are also per-database – if you have more than one database in your instance, then you need to keep things separate. The main driving factor in the separations above is to keep one filled-up filesystem from impacting other areas.
There are two reasons I separate active and archived transaction logs. The most important is for ease of support – if someone compresses (or deletes) an active transaction log, you will crash your database and it will not be fun. Believe me. So, the first-line support guys I work with know that it is safe to compress anything in the archive log path, but they are not under any circumstances to touch anything in the active log path in any way. The second reason is speed – the speed of the disks your active logs are on matters greatly. The speed of disks your archive logs are on really matters only on restore.
This scheme is also easy on build – you can go with Commerce’s default for database creation (AST for Commerce 7, and SMS tablespaces for Commerce 6) – changing the DFTDBPATH (to the data location) between DB2 instance creation and Commerce instance creation, and updating DIAGPATH and LOGARCHMETH1 parameters afterwards.
One thing to keep in mind is that it’s generally nice to have the same general filesystem layout (even if the underlying storage and sizes are different) for all your environments – prod, stage, and whatever test/dev/qa environments you may have. If you’re doing HADR, the storage details should be exactly identical in every way between your HADR servers.
Overall, you can put in a small amount of work at build time to prevent headaches and outages easier. If you’re rolling in cash or database speed is super-important, then there are better storage architectures. Whatever you do, going with the Commerce defaults is not best practice for the database tier.