What is a table scan?
DB2 has a number of ways of accessing the data it needs, but the most basic one – the one that will always work to get what it needs – is the table scan. A Table scan means that DB2 will read through every single row in a table to get the data it needs. Table scans appear on explain plans for some queries. My favorite method of generating an explain plan is described in this post: SQL Analysis Overview
Assuming the query has a where clause, DB2 may use a table scan to handle each row individually and ask “does this meet my criteria?”. If the answer is yes, then the row is added to the result set DB2 is building. If the answer is no, then the row is discarded and not used.
When do table scans occur?
DB2 may choose a table scan can whether your query is returning one row or every single row in a table. While a ‘select *’ with no where clause is almost certain to do a table scan, they can also occur in a scenario where you are returning only one row out of a million row table. A table scan is like DB2′s last resort – if it cannot find a an index to help, it will fall back on the table scan to examine each row in the table individually.
Table scans can even occur on tables you didn’t know existed. They can occur on temporary tables created just for this query (implicitly or explicitly). Since there is no index on some types of temporary tables, DB2 uses table scans a lot in this context. Check out my post on Fun with IN-lists and GENROW to see one of the ways this can look.
Why to avoid table scans
It is easy to see why table scans are a bad idea for selecting only a few rows out of a million row table. It seems less important to avoid with a small table of 6 or 7 rows that is being scanned to return just one row. Depending on how frequent each of those scenarios is, either one can be a performance killer. Consider table scans on small tables if done frequently just as much of a problem as they are on large tables.
Ways that table scans cause problems:
- Locking – Depending on isolation level, a table scan of even a small table can end up locking the whole table, reducing concurrency and increasing the likelihood of lock timeouts or even deadlocks. If it is a large table, it can even fill up the locklist and trigger a lock escalation to lock the whole table – which can occur at a variety of isolation levels. In my experience, locking issues are one of the primary causes of severe performance problems.
- Hogging the bufferpool – Depending on the size of the table and the size of the bufferpool, reading the whole table into the bufferpool can reduce your hit ratio by taking up too much space for the one table
- I/O – I/O takes time, and once you’ve run through your bufferpool, you could be going straight to disk, even with prefetch. Direct I/O is slow, and this can be a problem
- CPU Time – Comparing a million rows takes time, even if it all takes place in memory. You can actually see higher CPU time when a lot of table scans are occurring. CPU time may not be your overall bottleneck, but it can still be a problem, particularly if the database server is undersized on CPU to begin with.
- Increased Execution Time – A query that takes longer not only is more likely to have locking and concurrency issues, but it clearly is causing reduced response times to the end user.
Key indicators that table scans are occuring
Have I mentioned lately that I just love the ‘MON_GET…’ table functions introduced in 9.7? If you want to know which tables table scans are occuring on, it is only a quick query away, assuming you don’t care about a specific time frame other than since the last database activation:
$ db2 "select substr(tabschema,1,18) as tabschema, substr(tabname,1,30) as tabname, table_scans FROM TABLE(MON_GET_table('','',-2)) AS T order by table_scans desc fetch first 10 rows only with ur"
TABSCHEMA TABNAME TABLE_SCANS
------------------ ------------------------------ --------------------
WSCOMUSR ATTRVAL 567504
WSCOMUSR STTPCUSG 67325
WSCOMUSR XJORNADA 25758
WSCOMUSR CTXMGMT 12671
WSCOMUSR ACACTION 11919
WSCOMUSR GRRGSTRNT 10618
WSCOMUSR STOREENT 9452
WSCOMUSR SHIPMODE 8427
WSCOMUSR CMFILE 7445
WSCOMUSR ATTRDICTSRCHCONF 6824
10 record(s) selected.
Calculating your Index Read Efficiency is key to figuring out if excessive table scans are occurring, and can be useful in knowing where to look to help resolve them. For e-commerce databases, Index Read Efficiency is excellent at 10 or lower, good at 100 or lower, and may be acceptable as high as 1000. The databases I support that I consider to be running well generally run between about 4 and 40.
Index Read Efficiency can be calculated at many different levels. Calculate it at the database (or workload) level like this:
$ db2 "select rows_read, rows_returned, rows_read/rows_returned as IREF FROM TABLE(MON_GET_WORKLOAD('',-2)) AS T where WORKLOAD_NAME='SYSDEFAULTUSERWORKLOAD' with ur"
ROWS_READ ROWS_RETURNED IREF
-------------------- -------------------- --------------------
96378957991 163061727 591
1 record(s) selected.
I like to collect the data above on an hourly basis, subtracting to get just the data from an hour, and I can then tell when the read efficiency is the biggest problem. I’m less worried about bad numbers during night time hours when batch processing or load jobs may be occurring than I am during peak transaction times.
Calculate Index Read Efficiency at the statement level like this:
WITH SUM_TAB (SUM_RR) AS (
SELECT FLOAT(SUM(ROWS_READ))
FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T)
SELECT
SUBSTR(STMT_TEXT,1,20) AS STATEMENT,
ROWS_READ,
DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR,
ROWS_RETURNED,
CASE
WHEN ROWS_RETURNED > 0 THEN
DECIMAL(FLOAT(ROWS_READ)/FLOAT(ROWS_RETURNED),10,2)
ELSE -1
END AS READ_EFFICIENCY,
NUM_EXECUTIONS
FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T, SUM_TAB
ORDER BY ROWS_READ DESC FETCH FIRST 5 ROWS ONLY WITH UR;
STATEMENT ROWS_READ PCT_TOT_RR ROWS_RETURNED READ_EFFICIENCY NUM_EXECUTIONS
-------------------- -------------------- ---------- -------------------- --------------- --------------------
select * from schsta 199364 47.95 23 8668.00 23
INSERT into DBAMON09 97400 23.42 0 -1.00 7
INSERT INTO DBAMON09 40656 9.77 0 -1.00 7
SELECT ATTR.ATTR_ID 27630 6.64 27630 1.00 45
INSERT into DBAMON09 14752 3.54 0 -1.00 7
5 record(s) selected.
Calculating it at the statement level is particularly useful because it provides the worst queries over all to focus indexing efforts on. It is possible to use where clauses to pull out queries related to specific tables, though some of the ones that immediately come to mind (where uc(statement) like ‘%tablename%’) may not be very efficient in themselves.
How you may be able to avoid table scans
Can I stop every table scan? Absolutely not. If someone runs a ‘select *’ with no where clause, DB2 is just going to choose a table scan. It’s the right way to satisfy that kind of query. If you’re returning 90% of the table, it’s probably also appropriate. The line where it becomes inappropriate lies some place in the middle. Sometimes there are just too many conditions in the where clause to be able to handle the query any other way.
If I have a query that is executed very frequently and it returns about 10% of a table, that’s probably one I would want to index for.
That’s how to stop table scans – create indexes to access the data more efficiently. The first thing to look for in an index that I’m using to try to eliminate a table scan is to make sure that all of the columns in joins and where clauses are included in indexes in an order that DB2 can use them. Getting the right columns in those categories in indexes, you should be able to change a table scan into an index scan with a fetch back to the table to get any remaining data. db2advis is a great tool for figuring this out, though it’s not perfect.
The next level beyond that is index-only access. For “hogs and dogs” of SQL (the ones that perform bad and are frequently executed), index-only access is awesome. But it’s inefficient as far as storage space on disk, and if you go overboard with it you can negatively impact your insert performance. To achieve index-only access, you have to make sure every single column a query returns or references in any way is in one or more indexes in an order that DB2 can use. db2advis frequently recommends indexes that would achieve index-only access, and that’s one of the reasons it is important to carefully analyze the indexes it suggests – index-only access is not right for every query.
