Adding a GENERATED ALWAYS Column to a Table
GENERATED ALWAYS can be a blessing or a curse. I’m not talking about identity columns here, but about creating a column that is actually a duplicate of some part of the data to boost performance. Sure, in 10.5 we can do indexes on expressions, but for some clients I have trouble just getting them to go to a supported level, much less the latest and greatest. There are still some cases where I use this trick, though I analyze the situation thoroughly before using it.
In this case, my client needs to take just one character out of an integer column and use it in multiple where clauses. When they do this in the where clause, it inevitably results in a table scan, since the use of functions eliminates the use of indexes. The table is kind of central in their application design, and not very large. Also, it is a custom application, so the are able to change what SQL the application uses. In this case, duplicating some of the data to improve performance is no worse than creating indexes – that is, it must be done cautiously, but the right data can improve performance and in this case also concurrency. The driving factor here is a locking problem that is still being investigated. Some application comes in and locks up a couple of rows in the table for 5 or 10 minutes. Meanwhile, the query that uses the functions comes in and tries to get a share lock on the whole table in order to do the table scan. My client happens to be a bit sharp in this area, and discovered that when this issue occurs, if they run the query without the substring, it completes just fine, but with the function it gets a lock timeout. I’m fairly sure that this is because it does not have to scan the whole table, but instead hits an index and only gets the share locks on a few individual rows that meet its other criteria.
To alleviate the problem and allow us to index the data being queried, I’m adding a column that contains only the character added by the substr function being applied. The query we’re tuning for looks something like this:
SELECT * FROM schema.table WHERE processed = 'FALSE' and (substr(cast(comm_id as char(9)), 9, 1) = '4') ORDER BY insert_dt;
If I run an explain on this, as expected, I get a full table scan:
Access Plan: ----------- Total Cost: 7642.65 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 289.002 TBSCAN ( 2) 7642.65 3542 | 289.002 SORT ( 3) 7642.65 3542 | 289.002 TBSCAN ( 4) 7642.58 3542 | 180626 TABLE: ECROOKS TABLE: Q1
By adding a column and having the apps altered to query it instead, I can not only add and make use of an index, but I can also eliminate a type conversion.
Note that I cannot only issue the alter table statement to add the column. If I do so, I get this:
DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL20054N The table "QUALCOMM.SEND_WORKFLOW_INFO" is in an invalid state for the operation. Reason code="22". SQLSTATE=55019
Looking up the error and specifically the sections related to RC 22, I see:
PS D:\xtivia> db2 ? SQL20054N SQL20054N The table "
" is in an invalid state for the operation. Reason code = " ". Explanation: The table is in a state that does not allow the operation. The reason code indicates the state of the table that prevents the operation.
22 The generated column expression cannot be added or altered because the table is not in check pending mode.
22 Use SET INTEGRITY FOR
OFF before altering the table. Then alter the table and use SET INTEGRITY FOR IMMEDIATE CHECKED FORCE GENERATED to generate the values for the new or altered column.
Therefore the syntax that I actually have to use to accomplish this is:
set integrity for ecrooks.table off; alter table ecrooks.table add column comm_id_substr char(1) generated always as (substr(cast(comm_id as char(9)),9,1)); set integrity for ecrooks.table immediate checked force generated;
After adding the column, I also have to add an index on it – the column itself still gets me a table scan. The index in this case is:
create index ecrooks.TEMP_IDX_2 on table ecrooks.table (comm_id_substr, processed) allow reverse scans collect detailed statistics;
Since I know I have to go back to the table for the data, I only include the columns needed in the where clause. Given the table structure in this case, I could achieve index-only access by putting every column in the table in the index, but that is overkill in this (and almost every) case.
The revised query to make use of this new column (and index) is:
SELECT * FROM schema.table WHERE processed = 'FALSE' and comm_id_substr = '4' ORDER BY insert_dt;
After adding the column and the index, the changed query uses it, and sees a vast performance improvement
Access Plan: ----------- Total Cost: 26.0669 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.999994 TBSCAN ( 2) 26.0669 2.03771 | 0.999994 SORT ( 3) 26.0667 2.03771 | 0.999994 FETCH ( 4) 26.0661 2.03771 /-----+------\ 0.999994 179918 IXSCAN TABLE: ECROOKS ( 5) TABLE 12.8014 Q1 1 | 179918 INDEX: ECROOKS TEMP_IDX_2 Q1