Scripting/Automation for DBAs
So if you’re in a multiple-dba situation, you’ve probably got one DBA who is the “scripting guru”. If you’re a lone DBA, you’ve either written or acquired from others the scripts for basic automation of the most simple tasks. Everyone has their shortcuts and tricks. I thought I’d write a general post on things you should be aware of when automating or scripting things for DB2.
What language to use?
Some DBAs come from an application development background, and they are at an advantage in this area. They are much more familiar with the fundamentals of programming, and may have an easier time picking a language they already know. Overall, just about any language will do. If you’re starting out from almost nothing, I wouldn’t use C or JAVA or any language that is that detailed – it’s just a level of control you don’t really need.
Some sort of shell is an obvious choice for those on Linux or Unix. I’m a fan of KSH, but that’s probably because I started out on AIX systems. The advantage of one of the “SH”es is that it flows natrually from just knowing how to administer a db2 database at the command line. These inculde:
- Inability to handle numbers with precision to the right of the decimal point
- Text parsing must be handled via sed and awk
- oddities in loops – sometimes you have to connect within a loop
- Cross-platform – I write for unix/linux, but with an awareness of windows, and when I’ve had to alter something to work on Windows it’s often fast and easy
- Already there – If I stick to core modules, it’s already on Linux and Unix systems – nothing to install
- Text parsing capabilities are awesome and easy to use. Support for RegEx is built in
- It’s easy to build re-usable modules
- It’s easy to strip configuration parameters off into a separate script
DB2’s Built-in automation
So in short, DB2’s built-in automation sucks. Am I saying this because I’ve been using DB2 since version 5, when there sure as heck was no built-in automation, and I just got used to my own scripts? Maybe, but I have either given various pieces of automation a chance, or seen a client do so, or read about the limitations and decided it wasn’t even worth a chance.
The best personality traits for a DBA are to be a detail-oriented control freak. And along those lines, I want to KNOW without even looking when my last runstats and backups were. With DB2’s automated facilities for these you specify a window where they are acceptable, and then DB2 decides if they need to be done or not. I’d hate to discover that a low rate of database change meant my most recent backup was two months old. And on runstats, I tend to believe that even a tiny proportional change can make a major difference if it doesn’t match the distribution of the existing data.
The only kind of reorgs DB2 automates are the offline ones, and since I can very rarely get a window for that, there’s no point in automating them – I’ll write my own to run them on my schedule.
And don’t even get me started on the Health Monitor. I’m continually shocked at it’s overhead from many sources from practical experience and anecdotal stories from the real world. One of the first things I do in setting up new databases is to turn the Health Monitor off. I have actually seen it cause an outage on databases I support within the last year. So out it goes, and my own scripts for monitoring too.
In short, avoid the DB2 automation.
One of the important things in writing scripts is to have your own standards. This goes from documentation to variable naming and so on. There are actually a number of sets of standards out there that you can choose from – either in part or in whole. A few examples from my own scripts include:
- All Perl scripts are named db2.specific_name.pl where specific_name is lower-case underscore delimited and defines the purpose of the script, prefferably in two words.
- All mature Perl scripts have a separate config file where variables that may change from implementation to implementation are stored. This script is named db2.specific_name.cfg.pl.
- All scripts use variable names in lower case underscore delimited format (for example: $variable_name)
- All scripts list global variables towards the top
- All scripts list subroutines towards the top
- All scripts return a usage diagram with the syntax db2.specific_name.pl -h
- All scripts return a version number with the syntax db2.specific_name.pl -v
- All scripts start with a section of comments describing the purpose of the script – in this format for the base scripts(not configuration files:
#!/usr/bin/perl -w #----------------------------------------------------------------------# # Script Name : db2.get_snaps.pl # Author : Ember Crooks # Company : Brulant # Date Written : July 2008 # Script Version: db2.get_snaps.pl -v # DB2 Version : 8.2 # Purpose : To reset database monitors, sleep, then take snapshots # Usage : db2.get_snaps.sh --help # Cfg file : db2.get_snaps.cfg.pl # Change Log : # # Date Chg by Ver. Description # -------- ------------ ------- --------------------------------------- # 07/17/09 Ember Crooks 1.0.1 Fixed file archiving and retention # 07/2010 Ember Crooks 1.1.0 ****CFG FILE CHANGE**** # Added multiple database support # 08/2010 Ember Crooks 1.1.1 Bug Fix: One db failing should not stop # snapshots for other databases being # written. # 11/2010 Ember Crooks 1.1.2 Bug Fix: Archiving failing #----------------------------------------------------------------------#
One thing that is sorely lacking in many scripts I’ve seen is checking for errors. When you run a DB2 command at the command line, the OS basically considers it successful if it can find “db2” to pass the command to. All the SQL error messages are considered successful execution by the OS. So that means that you MUST parse the output of each and every command to ensure it returns what you expect it to. I have a collection of quick subroutines that I use that take the output and do things like ensure it starts with a specific string, or verify that it does NOT start with ‘SQL’. This is really one of the hallmarks of a successful scripter, and should be a requirement for every line in every script.
It is useful to code a few things that you’ll use over and over again – like checking for errors or emailing output and so forth in such a way that you can at least copy and paste from one script to another. Eventually you can work this stuff into modules and such, but just writing it for easy copy and paste is a good start.
I really wish I could just post every script I’ve written right here for my readers to share, but there are a couple of problems with that. One is liability – you use my script and my script destroys your database and you sue me. I’m just this person writing a technical blog in her spare time, and am not insured for that. The other is that technically, my scripts belong to the company I work for, and they sell them to their clients as a part of specific contracts.
The other thing to mention here is to be careful with the scripts you get from others. Every script has a set of assumptions that it’s running on, and not knowing those assumptions can be dangerous. You also need to be careful of actual maliciousness from things you get from unknown sources – you should understand what a script is doing before actually using it.