The Role of the DB2 DBA in Load/Performance Testing
I am consistently shocked and appalled at how few clients actually do load testing. Many pay it lip service, but few actually engage in real, meaningful performance and load testing. I understand that load testing of an application can be difficult and expensive. It is not something that every developer has as a part of their toolkit. But the most successful go-lives and holiday seasons I’ve seen high-volume e-commerce sites endure are the ones that went through at least some load testing.
What Kind of Load Testing
First of all, the most useful kind of load test is one that is done from the application. Yes, pure-database load testing can be useful, but there is nothing like a load test that simulates end-user activity. Load testing at only the database level may be useful as a first validation point on certain kinds of changes in a test environment, but in e-commerce databases, patterns of access often change so fast that actual application testing is the only way to accurately represent tomorrow’s load.
Before a go-live, full load testing should be done in the production environment. This is an excellent opportunity to establish baselines and to make sure that the application and database hold up under loads.
After go-live it gets more complicated, because that system is already in use. If you are lucky enough to have a full-size performance test environment, that is ideal. That makes it so you can test full loads against full data and really understand what’s going on. Partial size performance test environments I don’t find all that useful for load testing, though they may be good for regression testing and upgrade testing. I have also seen some clients that are geographically focused perform load testing on their production environment outside of normal hours – often in the middle of the night.
Types of Tests
There are often a vast array of types of load tests. Some load tests set a specific goal, and are repeated until that goal can be reached and maintained for a specific time. Others see how much a system can take before it breaks. Some load tests focus on a specific goal or workload, while others attempt to simulate the full volume a database may see. Examples of this in a WebSphere Commerce environment include testing stagingprop and cache invalidation while running a regular user workload. I would also argue that failing over the database during a load test can be very valuable. Often, a failover will take longer under load, and knowing that helps to prevent panic when an actual real-life failover occurs under high load.
In my opinion, load testing should always be done prior to a go-live of a new database, a new site, or a new application. It is shocking and appalling how many developers pay little heed to performance and just seem to go with the first thing that works. I find myself pleasantly shocked when I actually get to deal with a developer who seems to have thought about database performance.
It is also good to do load testing before any critical peak periods, whatever they may be for your business. I find that the pattern of SQLs run changes in deployments over the course of time, so doing load testing before these peak periods can catch anything that hasn’t been severe enough to get noticed under normal volume.
The most dismal flops of load tests that I have seen were actually not due to flawed applications or database design, but due to problems with the load testing methodology. When 2500 concurrent users are simulated using only 100 unique user ids, there are deadlocks as 250 sessions try to check out the same cart at the same time. It may sound obvious when I put it that way, but that is literally a scenario I have seen in the real world. As the DBA, you may have to ask serious questions about load test methodology, especially when the symptoms of a poorly designed load test may show up in the database.
Load Testing “Failures”
The point of a load test is to find a problem. It shows us where to look, what the bottlenecks are, and what problems to focus on. It does not represent a failure to have an application not perform as expected during a load test. In fact, this is a success of a load test! An application or database problem detected during load testing is one avoided on live production.
When Should the DBA be Involved?
In my opinion, a DBA should be involved in most load tests. Often the approach is that the DBAs will be engaged only after a problem with the database is suspected. The problem with that approach is that I can SEE what will be a problem under even greater load during a load test. I can head problems off at the pass and start addressing problem SQL and bottlenecks BEFORE they get to the point that anyone notices them. If you don’t engage a DBA in load tests, you will end up over-buying hardware for your database servers. Bring me in on every load test you’re willing to pay for me to watch, because often you will get your money’s worth in problem avoidance and smaller hardware and database licensing bills.
Actions for a DB2 DBA During a Load Test
I’m not sharing specific scripts, but describing what I do during a load test. First, I schedule a script to gather data every 5 minutes. I export data to CSVs from all of my favorite monitoring functions like MON_GET_DATABASE and MON_GET_TABLE. I add a database connection timestamp and a ‘snapshot’ timestamp to each row. Depending on what red flags I see, I can then pull this data into excel, and graph it and play with it to better understand and communicate what I see.
I flush the package cache before the load test, and look at problem SQL both at the 5-minute intervals, and also at the end of the test. I also capture monreport.dbsummary to better understand wait times and the other metrics reported there.
I also run a locking event monitor that writes data about lock timeouts and deadlocks to an unformatted event monitor table. Honestly I try to run one of these in every database I support. If there are locking issues, this gives me the data to analyze what is going on.
During the load test, I’m not really looking at any of that. During the load test, I’m using db2top or dsmtop to watch what’s going on. db2top is very low impact. I am often bouncing between the database, session, and dynamic sql screens of db2top, with a visit or two to the bufferpools screen and the locking screen. This allows me to keep an eye open for red flags and investigate specific issues as I see them. Sometimes that’s locking issues. Other times it is problem SQL that really stands out. I also make sure that I’m keeping an eye on CPU utilization – often through adding it to db2top by adding a cpu line in .db2toprc.
So far, I generally am choosing not to use full-on DSM during a load test because I still worry about the impact DSM’s monitoring has on the database being monitored.
What are your thoughts on load testing and the DBA’s role?