Fun with IN-lists and GENROW

I’ve been having some fun with explains lately. I’ve been preparing a presentation for some developers on how to analyze queries – I’ll be presenting next week when I’m on-site. So I started digging through some explain plans I had sitting around from various times when I’ve need to do analyses, and I started going through them. One reccomendation on my list of “DB2 SQL Best Practices” is to avoid in-lists whenever possible. Just one of those tips that I’ve always known and never really thought about. So I start digging into this excerpt from explain plan:

                                                    0.716508                                         
                                                     FETCH                                          
                                                     (   6)                                         
                                                     133.716                                        
                                                       38                                     
                                                    /---+---                                 
                                                  1          55804                            
                                               RIDSCN   TABLE: WSCOMUSR                       
                                               (   7)      CATENTRY                                
                                               77.2279                                             
                                                  6                                                
       +----------------+----------------+--------+-------+----------------+----------------+    
        1                1                1                1                1                1   
     SORT             SORT             SORT             SORT             SORT             SORT   
     (   8)           (  10)           (  12)           (  14)           (  16)           (  18)
     12.8719          12.8719          12.8719          12.8719          12.8719          12.8719
        1                1                1                1                1                1
       |                |                |                |                |                |
        1                1                1                1                1                1
     IXSCAN           IXSCAN           IXSCAN           IXSCAN           IXSCAN           IXSCAN
     (   9)           (  11)           (  13)           (  15)           (  17)           (  19)
     12.8711          12.8711          12.8711          12.8711          12.8711          12.8711
        1                1                1                1                1                1
       |                |                |                |                |                |
      55804            55804            55804            55804            55804            55804
 INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR
    I0000518         I0000518         I0000518         I0000518         I0000518         I0000518

Having not done much in-depth analysis recently and most of my recent analysis being of the popcorn and bubble-gum variety using the index advisor, I wonder why it is scanning the same index 6 times (and that is taking up just less than 1/2 of the total timerons). Looking into the query itself, I find an IN list with exactly 6 items – whaddya know. So in this case, DB2 had to scan the SAME index 6 times -once for each item in the list, and then it did a RID scan of that list and fetched the rest of the data from the base table. I now have proof of how IN lists can work and a nifty example for the developers.

So digging into more queries, I find this access plan:

Access Plan:
-----------
  Total Cost:   102.842
  Query Degree:  1

              Rows
             RETURN
             (   1)
              Cost
               I/O
               |
                4
             NLJOIN
             (   2)
             102.842
                8
           /----+---
         4             1
      TBSCAN        IXSCAN
      (   3)        (   6)
    0.00286127      25.7188
         0             2
        |             |
         4           55804
      SORT      INDEX: DB2INST1
      (   4)        IBM_007
    0.00214227
         0
        |
         4
      TBSCAN
      (   5)
    5.95154e-05
         0
        |
         4
 TABFNC: SYSIBM  
      GENROW

Nothing too expensive or complicated here – single table access. But why the heck is it doing a JOIN when the query only accesses a single table with no subqueries? And furthermore, what’s this GENROW thing that it’s doing a table scan on and getting 4 rows. The documentation is real helpful on GENROW (from the Info Center):

GENROW
Generates a table of rows.

Digging a bit further, I find:

This operator is used by the optimizer to generate rows of data.

Operator name: GENROW

Represents: A built-in function that generates a table of rows, using no input from tables, indexes, or operators.

GENROW can be used by the optimizer to generate rows of data (for example, for an INSERT statement or for some IN-lists that are transformed into joins).

Ok, so burried in there is a reference to IN-lists and commerce tends to be a bit IN-list happy. Looking at the query, I find an IN-list with 4 values. So DB2 is essentially treating that 4 rows as a (non-indexed) table, and then doing an NLJOIN between it and the table itself. So on a single-table query, I get a sort, a table scan, and a join, even though I’m getting index-only access to the table directly.

Much of this I’ve learned before when digging into the details of explains, but I haven’t focused so much on how IN-lists are handled, and thought it was interesting.

=) yes, this is my idea of fun on a Thursday night.

You may also like...

3 Responses

  1. David Fuente says:

    Hello, Great Post !. I’m facing the same situation. I have 2 equal databases in 2 different servers, but when I execute the explain plan.. one is using the same index several times and the other is using genrow. What’s the difference? Could be statistics or anything else? Thanks. Regards.

    • Ember Crooks says:

      Do they have the same statistics and exactly the same data? My guess would be that they have different data or different statistics. I would look also at if one is much faster than the other.

  2. Anon says:

    Thanks, for the information on genrow. I tried using “or” clause instead of in-list for example “where (pred1=”value1″ or pred1=”value2″)” but it still produces genrow section.

Leave a Reply

Your email address will not be published. Required fields are marked *