Top 10 tips for SQL performance on DB2
Ok, so everyone has their top 10, and some of them are significantly different – but here’s my top 10 on SQL performance for DB2. I mostly share this information with developers, but it’s important for a DBA to be aware of these things. Since it’s directed more at developers, it ignores some of the basics like “do runstats” and “do rerogs”. This top 10 is directed purely at WRITING sql, and not full database support.
Where is the cost?
Some actions can occur either in the SQL or in the Application. Some of these recommendations assume that the cost is reduced at the database level, but do not take into account the extra cost at the application level. I don’t expect that every query meet all of these suggestions – sometimes you simply have to use an inefficient method to get the data the way you need it.
These recommendations are also focused on OLTP databases which favor small fast queries.
What we aim for
We don’t want:
Predicate types from most to least efficient:
The 10 tips:
- Select only that which you need. Avoid SELECT *, opting instead for specifying the smallest number of columns that will meet your needs.
- Each returned column has an overhead that is multiplied by the number of rows
- If there are too many columns then index-only access will not be an option
- Even if you don’t see errors due to locking, you may still impact performance time not just of this query, but of other queries
- The most important place to do this is on the join predicates – and remember that join predicates include those not using the “join” keyword. Using functions or even expressions in join predicates also limits the join method to Nested Loop and may produce inaccurate results
- The second most important place to do this is in the where clause
- This may also be useful in the column list, depending on circumstances
- These often require sorts, and sorts are frequently time eaters
- A GROUP BY may be a better choice than DISTINCT
- Multiple smaller queries may be more efficient
- Strangely enough, DB2 does not handle this for you.
- If Parameter Markers are used, DB2 does not make use of distribution statistics
- If Parameter Markers are not used, then DB2 re-calculates the access plan every time you run the query
- Balance the reduced preparation time with the performance advantaged gained by the use of statistics – this is especially important when your data is not distributed in a normal fashion.