Adding a GENERATED ALWAYS Column to a Table

GENERATED ALWAYS can be a blessing or a curse. I’m not talking about identity columns here, but about creating a column that is actually a duplicate of some part of the data to boost performance. Sure, in 10.5 we can do indexes on expressions, but for some clients I have trouble just getting them to go to a supported level, much less the latest and greatest. There are still some cases where I use this trick, though I analyze the situation thoroughly before using it.

Why

In this case, my client needs to take just one character out of an integer column and use it in multiple where clauses. When they do this in the where clause, it inevitably results in a table scan, since the use of functions eliminates the use of indexes. The table is kind of central in their application design, and not very large. Also, it is a custom application, so the are able to change what SQL the application uses. In this case, duplicating some of the data to improve performance is no worse than creating indexes – that is, it must be done cautiously, but the right data can improve performance and in this case also concurrency. The driving factor here is a locking problem that is still being investigated. Some application comes in and locks up a couple of rows in the table for 5 or 10 minutes. Meanwhile, the query that uses the functions comes in and tries to get a share lock on the whole table in order to do the table scan. My client happens to be a bit sharp in this area, and discovered that when this issue occurs, if they run the query without the substring, it completes just fine, but with the function it gets a lock timeout. I’m fairly sure that this is because it does not have to scan the whole table, but instead hits an index and only gets the share locks on a few individual rows that meet its other criteria.

How

To alleviate the problem and allow us to index the data being queried, I’m adding a column that contains only the character added by the substr function being applied. The query we’re tuning for looks something like this:

SELECT * 
FROM schema.table 
WHERE   processed = 'FALSE'
        and (substr(cast(comm_id as char(9)), 9, 1) = '4')
ORDER BY insert_dt;

If I run an explain on this, as expected, I get a full table scan:

Access Plan:
-----------
    Total Cost:         7642.65
    Query Degree:       1

          Rows 
         RETURN
         (   1)
          Cost 
           I/O 
           |
         289.002 
         TBSCAN
         (   2)
         7642.65 
          3542 
           |
         289.002 
         SORT  
         (   3)
         7642.65 
          3542 
           |
         289.002 
         TBSCAN
         (   4)
         7642.58 
          3542 
           |
         180626 
     TABLE: ECROOKS 
         TABLE:
           Q1

By adding a column and having the apps altered to query it instead, I can not only add and make use of an index, but I can also eliminate a type conversion.

Note that I cannot only issue the alter table statement to add the column. If I do so, I get this:

DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL20054N  The table "QUALCOMM.SEND_WORKFLOW_INFO" is in an invalid state for
the operation.  Reason code="22".  SQLSTATE=55019

Looking up the error and specifically the sections related to RC 22, I see:

PS D:\xtivia> db2 ? SQL20054N


SQL20054N The table "" is in an invalid state for
          the operation.  Reason code = "".

Explanation:

 The table is in a state that does not allow the operation.  The
reason code indicates the state of the table that prevents the
operation.

 22 The generated column expression cannot be added or altered
because the table is not in check pending mode.

 22 Use SET INTEGRITY FOR  OFF before altering the
table. Then alter the table and use SET INTEGRITY FOR
 IMMEDIATE CHECKED FORCE GENERATED to generate    the
values for the new or altered column.

Therefore the syntax that I actually have to use to accomplish this is:

set integrity for ecrooks.table off;
alter table ecrooks.table add column comm_id_substr char(1) generated always as (substr(cast(comm_id as char(9)),9,1));
set integrity for ecrooks.table immediate checked force generated;

After adding the column, I also have to add an index on it – the column itself still gets me a table scan. The index in this case is:

create index ecrooks.TEMP_IDX_2 on table ecrooks.table (comm_id_substr, processed) allow reverse scans collect detailed statistics;

Since I know I have to go back to the table for the data, I only include the columns needed in the where clause. Given the table structure in this case, I could achieve index-only access by putting every column in the table in the index, but that is overkill in this (and almost every) case.

Results

The revised query to make use of this new column (and index) is:

SELECT * 
FROM schema.table 
WHERE   processed = 'FALSE'
        and comm_id_substr = '4'
ORDER BY insert_dt;

After adding the column and the index, the changed query uses it, and sees a vast performance improvement

Access Plan:
-----------
    Total Cost:         26.0669
    Query Degree:       1

              Rows 
             RETURN
             (   1)
              Cost 
               I/O 
               |
            0.999994 
             TBSCAN
             (   2)
             26.0669 
             2.03771 
               |
            0.999994 
             SORT  
             (   3)
             26.0667 
             2.03771 
               |
            0.999994 
             FETCH 
             (   4)
             26.0661 
             2.03771 
         /-----+------\
    0.999994          179918 
     IXSCAN       TABLE: ECROOKS 
     (   5)           TABLE
     12.8014            Q1
        1 
       |
     179918 
 INDEX: ECROOKS 
   TEMP_IDX_2
       Q1

You may also like...

7 Responses

  1. Isaac Munoz says:

    Excellent solution Ember.. thanks for sharing it.

    It would be good if later when you find out the root cause for the locking issue you also share it with us… I don’t have the full picture but would it be possible DB2_SKIPINSERTED, DB2_SKIPDELETED and DB2_EVALUNCOMMITTED make a difference ?.

    Regards

  2. Chang-ho Song says:

    Hi ember !
    I’m newbie db2 engineer in South Korea
    I visit db2commerce.com daily.
    All article on this site is excellent for me
    I want to post your article on my blog to share with db2 user in South Korea
    Can I do this ? : D

    • Ember Crooks says:

      Are you translating the article? If not, I’d prefer you just link to the article. If you are translating it, I’d be happy for you to re-post, and include a link to the original.

      • Chang-ho Song says:

        I will translate your article into Korean as much as possible and include a link to the original.
        Thank you for your kindness, Ember : )

  3. raju says:

    Hello Ember,

    how to migrate db2 database in different Operating systems..

    Here my task is..
    I have Linux based db2 db backup image ,I need to restore windows server..

    could you please provide me pre migaration and migration steps..

    db version is V9.7 .

    is there any tools, To migrate database in different platforms…?

    • Ember Crooks says:

      You cannot backup/restore across platforms. So the best option is generally to use db2look and db2move in a manner something like this:

      1. db2look on old database (use alternate delimiter if any triggers or stored procedures)
      2. db2move export on old database
      3. split db2look into 2 sections before foreign keys and everything after foreign keys
      4. create database on target and run first section of db2look
      5. db2move load on new database (or load data some other way – data conversion workbench may also be an option, or loads from cursor, manually scripted)
      6. run second section of db2look

      Keep a close eye on triggers and stored procedures to make sure they’re accurately re-created. Stored procedures written in anything other than SQL may be problematic, and require that the files for them be moved manually.

      All of this takes longer than backup/restore and is more risky – you have to be very detail oriented to make sure you get everything.

  1. March 5, 2015

    […] Adding a GENERATED ALWAYS Column to a Table […]

Leave a Reply

Your email address will not be published. Required fields are marked *