Tips on Using db2advis with Multiple Queries

Many times, the SQL analysis I do is extremely focused – mostly on SQL that is a proven resource hog or a suspected problem. Analyzing all SQL used in a particular process can be a bit different, and may not be something that is frequently done in WebSphere Commerce databases.

Recently, I had the output of statement event monitors while specific tasks were run against a development database, and was asked to do the more general tuning instead of the focused tuning that I normally do.

This post covers some tips for using db2advis or the design advisor at the command line with multiple queries as input.

Input

In my case the input came from some processing that IBM did over the formatted output files from statement event monitors. You don’t really have to have that data – all that is really required is the SQL statements and the frequency at which they are executed within the workload. Using that input, I generated a file that looked in part like this:

--#SET FREQUENCY 4
SELECT ATTRIBUTE.ATTRIBUTE_ID, ATTRIBUTE.LANGUAGE_ID, ATTRIBUTE.ATTRTYPE_ID, ATTRIBUTE.NAME, ATTRIBUTE.SEQUENCE, ATTRIBUTE.DESCRIPTION, ATTRIBUTE.CATENTRY_ID, ATTRIBUTE.DESCRIPTION2, ATTRIBUTE.FIELD1, ATTRIBUTE.OID, ATTRIBUTE.USAGE, ATTRIBUTE.QTYUNIT_ID, ATTRIBUTE.GROUPNAME, ATTRIBUTE.NOTEINFO, ATTRIBUTE.MULTITYPE, ATTRIBUTE.OPTCOUNTER FROM ATTRIBUTE WHERE ATTRIBUTE.CATENTRY_ID IN (?) AND ATTRIBUTE.LANGUAGE_ID IN (?) ORDER BY ATTRIBUTE.SEQUENCE;
--#SET FREQUENCY 4
SELECT DISTINCT STOREDEFCAT.STOREDEFCAT_ID, STOREDEFCAT.STOREENT_ID, STOREDEFCAT.CATALOG_ID, STOREDEFCAT.FIELD1, STOREDEFCAT.FIELD2, STOREDEFCAT.FIELD3, STOREDEFCAT.OPTCOUNTER FROM STOREDEFCAT WHERE STOREDEFCAT.STOREENT_ID IN (?, ?);
--#SET FREQUENCY 3
SELECT T1.TOTALTAX, T1.LOCKED, T1.TOTALTAXSHIPPING, T1.STATUS, T1.FIELD2, T1.TIMEPLACED, T1.FIELD3, T1.CURRENCY, T1.SEQUENCE, T1.TOTALADJUSTMENT, T1.ORMORDER, T1.SHIPASCOMPLETE, T1.PROVIDERORDERNUM, T1.TOTALPRODUCT, T1.DESCRIPTION, T1.MEMBER_ID, T1.ORGENTITY_ID, T1.FIELD1, T1.STOREENT_ID, T1.ORDCHNLTYP_ID, T1.ADDRESS_ID, T1.LASTUPDATE, T1.ORDERS_ID, T1.COMMENTS, T1.NOTIFICATIONID, T1.TYPE, T1.EDITOR_ID, T1.OPTCOUNTER, T1.SOURCEID, T1.EXPIREDATE, T1.BUSCHN_ID, T1.BLOCKED, T1.TOTALSHIPPING, T1.TRANSFERSTATUS, T1.OPSYSTEM_ID, T1.BUYERPO_ID FROM ORDERS  T1 WHERE (T1.STATUS = ?) AND (T1.MEMBER_ID = ?) AND (T1.STOREENT_ID = ?) ORDER BY T1.ORDERS_ID;

...

In the example above, the SET FREQUENCY statements are in the format required by db2advis. The SQL is properly formed, includes parameter markers (represented by question marks), and does not usually qualify tables with the schema name.

Once I had this, I placed it in a file called checkout_logon_all.sql in a working directory I use for explains and db2advis. My file had 97 SQL statements in it. Once that file existed, I could use it as my input for db2advis.

Processing

The next step is to execute db2advis, providing the file as input. That looks like this:

> db2advis -d SAMPLE -q WSCOMUSR -i checkout_logon_all.sql |tee checkout_logon_all.sql.advis

Using user id as default schema name. Use -n option to specify schema
execution started at timestamp 2013-06-14-17.33.03.220139
found [97] SQL statements from the input file
Recommending indexes...
total disk space needed for initial set [   0.712] MB
total disk space constrained to         [ 156.057] MB
Trying variations of the solution set.
Optimization finished.
  11  indexes in current solution
 [4382.5830] timerons  (without recommendations)
 [3945.0443] timerons  (with current solution)
 [9.98%] improvement
...

The ‘WSCOMUSR’ above is the schema name that should be assumed for any unqualified tables in the SQL. The information there tells us that 11 indexes were found that could help, with an overall improvement for the workload of 9.98%. I’m pretty sure that expected improvement does not count any slowdown to insert/update statements due to additional indexes.

Being an enlightened DBA, you should know that blindly following all suggestions of db2advis is never a good idea. I have other workloads that run against these tables beyond this one process, and have to consider the costs of additional and duplicate indexes. So how do you determine which of the 11 indexes might make sense?

At the bottom of db2advis output is a section of XML, which I generally ignore when using db2advis to look at a single statement. But it is pure gold when you’re using db2advis to look at a full workload like this.

Output

After doing a bit of formatting (removing the leading — on every line, and replacing < and > within your code with & l t ; and & g t ;), I import the XML into Excel, and get something that looks like this(click the image to see it full-size and clearly):
db2advis_xml

This is a list of every index used by the queries in the list – both the newly recommended ones and existing ones. Pulling the data into a spreadsheet lets you sort on various columns. The “/index/benefit” column is in timerons, and shows you how much benefit in timerons you can expect to see by adding a particular index. Looking at them by table(/index/table/identifier/name) I find particularly interesting, as there are some tables I will more freely add indexes on based on how important insert/update performance. The “/index/statementlist” column lists which statements in the workload a particular index applies to. There is also additional data in that XML – you’ll find it down and to the right when you’ve imported the data into Excel. It lists things out by statement – so you can see the impact that each index has on each statement. I’ll be looking at that in more detail on some of my single-SQL analysis to see if it looks correct and useful.

Decisions

Obviously this kind of analysis has a very specific usage. You’re not going to be running this every day. As always, indexing decisions take careful analysis. There’s nothing here that will do that analysis for you, but it may provide some additional tools.

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Hi,
    I have some sql I would like to explain/advise but it has multiple steps which confounds db2expln and db2advis :

    DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMPTABLE1
    AS (SELECT DISTINCT ACTION_ID AS COL1
    FROM ACTION
    WHERE LTRIM(RTRIM(ACTION.ACTION_DESC)) LIKE ‘%Pend%’) DEFINITION ONLY
    ON COMMIT DELETE ROWS
    NOT LOGGED;

    INSERT INTO SESSION.TEMPTABLE1
    SELECT DISTINCT ACTION_ID
    FROM ACTION
    WHERE LTRIM(RTRIM(ACTION.ACTION_DESC)) LIKE ‘%Pend%’;
    –#SET FREQUENCY 4
    SELECT P.WORK_ITEM_FACT_ID,
    P.WORK_ITEM_ID,
    P.WORK_ITEM_ACTION_END_TS AS PEND_START_TIME,
    MIN(NP.WORK_ITEM_ACTION_BEGIN_TS) AS PEND_END_TIME,
    (SELECT COUNT(1)
    FROM CAL_DATE
    WHERE (CO_HOLIDAY_FL = ‘Y’
    OR WEEKDAY_FL = ‘N’)
    AND CAL_DT BETWEEN DATE(P.WORK_ITEM_ACTION_END_TS) AND DATE(MIN(NP.WORK_ITEM_ACTION_BEGIN_TS))) AS HOLIDAYS_COUNT
    FROM (SELECT WORK_ITEM_FACT1.WORK_ITEM_FACT_ID,
    WORK_ITEM_FACT1.WORK_ITEM_ID,
    WORK_ITEM_ACTION_FACT1.WORK_ITEM_ACTION_END_TS
    FROM SESSION.TEMPTABLE1,
    WORK_ITEM_FACT WORK_ITEM_FACT1,
    WORK_ITEM_ACTION_FACT WORK_ITEM_ACTION_FACT1
    WHERE WORK_ITEM_FACT1.LAST_MODIFIED_DATE = ANY (SELECT POPULATION_TS
    FROM POPULATION_INFO
    WHERE LOAD_STATUS_CD = ‘C’
    AND UNIQUE_ETL_PROCESS_ID = ‘SIW RETAIL FACT LOAD’)
    AND WORK_ITEM_FACT1.WORK_ITEM_ID = WORK_ITEM_ACTION_FACT1.WORK_ITEM_ID
    AND WORK_ITEM_ACTION_FACT1.ACTION_ID = SESSION.TEMPTABLE1.COL1) P
    left outer join (SELECT WORK_ITEM_FACT2.WORK_ITEM_FACT_ID,
    WORK_ITEM_FACT2.WORK_ITEM_ID,
    WORK_ITEM_ACTION_FACT2.WORK_ITEM_ACTION_BEGIN_TS
    FROM WORK_ITEM_FACT WORK_ITEM_FACT2,
    WORK_ITEM_ACTION_FACT WORK_ITEM_ACTION_FACT2
    WHERE WORK_ITEM_FACT2.LAST_MODIFIED_DATE = (SELECT POPULATION_TS
    FROM POPULATION_INFO
    WHERE LOAD_STATUS_CD = ‘C’
    DROP TABLE SESSION.TEMPTABLE1;

    Any ideas on how to get this to run through db2advis or db2expln ?

    thanks

    • You really have to run each statement through individually. You can use db2advis to retain explain information, but then generating explain plans off of that information is not simple. It is easier to run each statement with your explain mode set to “yes” (which will both execute and store explain info), and then between each, run db2exfmt (with the -1 option) to get the formatted explain data.