DB2 Workload Manager (WLM) as a Monitoring Solution– Analyzing WLM Information (Part 3 of 3)

Abhik_Headshot
This is part three of a three part series. The full series includes:
DB2 Workload Manager (WLM) as a Monitoring Solution – Understanding WLM (Part1 of 3)
DB2 Workload Manager (WLM) as a Monitoring Solution– How to Set up WLM (Part 2 of 3)
DB2 Workload Manager (WLM) as a Monitoring Solution– Analyzing WLM Information (Part 3 of 3)

The previous articles explored what DB2 Workload Management (WLM) is and the various objects that make up the WLM ecosystem. The second article in this series defined a targeted monitoring strategy and created the proper service super classes, service classes, and workloads to implement it. The second article also enabled the WLM event monitors to start collecting data.
This final article looks at some of the reports that can be created to look at performance trends and briefly covers histograms, which are really cool technology.

Querying the WLM Table Functions

As mentioned earlier, DB2 will store the monitoring data in memory until reset and externalize to WLM event monitor tables via the WLM_COLLECT_STATS() procedure or by automatic data capturing set by the WLM_COLLECT_INT database parameter.
Table functions can be used to look at this monitoring data still in DB2 memory:
• MON_GET_SERVICE_SUBCLASS
• MON_GET_SERVICE_SUBCLASS_DETAILS
• MON_GET_SERVICE_SUBCLASS_STATS
• MON_GET_SERVICE_SUPERCLASS_STATS
• MON_GET_WORKLOAD
• MON_GET_WORKLOAD_DETAILS
• MON_GET_WORKLOAD_STATS
As an example this SQL queries the WLM_GET_SERVICE_SUBCLASS_STATS table function:

SELECT VARCHAR(SERVICE_SUPERCLASS_NAME, 30) AS SUPERCLASS,
       VARCHAR(SERVICE_SUBCLASS_NAME, 30) AS SUBCLASS,
       LAST_RESET,
       NUM_REQUESTS_TOTAL,
       REQUEST_EXEC_TIME_AVG,
       REQUEST_EXEC_TIME_TOTAL,
       UOW_THROUGHPUT,
        UOW_COMPLETED_TOTAL,
       TOTAL_CPU_TIME,
        
FROM TABLE(SYSPROC.WLM_GET_SERVICE_SUBCLASS_STATS('REPORTING',
       'SLARPTSC', -1)) AS T;
SUPERCLASS          SUBCLASS           LAST_RESET                        NUM_REQUESTS_TOTAL      REQUEST_EXEC_TIME_AVG    
------------------------------ ------------------------------ -------------------------- -------------------- ------------------------ ----------------------- ------------------
REPORTING               SLARPTSC         2014-07-07-12.00.10.977721                      272          +5.40000000000000E-002               

Output continued.... 

REQUEST_EXEC_TIME_TOTAL    UOW_THROUGHPUT              UOW_COMPLETED_TOTAL       TOTAL_CPU_TIME
     136                  +8.26245471835136E-002              68                     94587

As you can see, the service subclass ‘SLARPTSC’ under service superclass ‘REPORTING’ was last reset from memory on July 7th at noon. Since last reset, the average SQL request execution time in this service class was +5.40000000000000E-002 milli seconds and 68 UOW activities have been completed

Querying the WLM Event Monitor Tables to get Persistent and Historical Monitoring Data

The SQL below can be used to get similar service class statistics for service class ‘SLARPTSC’ using the event monitor tables that store historical data.

SELECT VARCHAR(SERVICE_SUPERCLASS_NAME, 30) AS SUPERCLASS,
       VARCHAR(SERVICE_SUBCLASS_NAME, 30) AS SUBCLASS,
       LAST_WLM_RESET,
        REQUEST_EXEC_TIME_AVG,
        UOW_COMPLETED_TOTAL
           
FROM SCSTATS_DB2STATISTICS where SERVICE_SUBCLASS_NAME = 'SLARPTSC';
SUPERCLASS  SUBCLASS         LAST_WLM_RESET                REQUEST_EXEC_TIME_AVG      UOW_COMPLETED_TOTAL
------------------------------ ------------------------------ -------------------------- --------------------------------------------------------------
REPORTING    SLARPTSC        2014-05-29-09.45.08.261369              0                       10                                                 
REPORTING    SLARPTSC        2014-05-29-09.56.51.716245              0                       4
REPORTING    SLARPTSC        2014-05-29-10.01.44.728405              0                       10
REPORTING    SLARPTSC        2014-05-29-10.14.42.899526              0                       4
REPORTING    SLARPTSC        2014-05-29-10.19.03.255451              0                       142
REPORTING    SLARPTSC        2014-05-29-13.58.57.710431              1                      7640824

This data tells us that since the last reset timestamp of 2014-05-29-13.58.57.710431 the average request execution time was 1 milli seconds and 7640824 units of work completed.

Similar SQL can be used to find statistics for a workload in question from the historical WLSTATS_DB2STATISTICS table.

SELECT VARCHAR(WORKLOAD_NAME, 30) AS WORKLOAD_NAME,
       LAST_WLM_RESET,
       LOCK_WAIT_TIME_TOP,
       ROWS_RETURNED_TOP,
       UOW_LIFETIME_AVG
       


FROM WLSTATS_DB2STATISTICS where WORKLOAD_NAME ='SLARPTWL';
WORKLOAD_NAME      LAST_WLM_RESET                 LOCK_WAIT_TIME_TOP ROWS_RETURNED_TOP  UOW_LIFETIME_AVG
------------------------------ -------------------------- -------------------- ------------------------------
SLARPTWL                 2014-05-29-11.00.37.167154        264041               246              3
SLARPTWL                 2014-05-29-14.13.08.173211        2458262              246              3

The above output gives us the lock wait times, rows returned and average unit of work lifetime for workload SLARPTWL for each WLM reset time.

SQLs that show some aggregate performance reports:

Example 1

SELECT DATE(statistics_timestamp) AS stat_date, TIME(statistics_timestamp) AS stat_time,
 SUBSTR(service_subclass_name,1,10) AS subclass_name, 
CASE WHEN 0 > INT(SUM(concurrent_act_top)) THEN 0 ELSE INT(SUM(concurrent_act_top)) END AS con_act_top,
 CASE WHEN 0 > INT(SUM(concurrent_connection_top)) THEN 0 ELSE INT(SUM(concurrent_connection_top)) END AS CON_CONN_TOP, 
CASE WHEN 0 > INT(SUM(coord_act_completed_total)) THEN 0 ELSE INT(SUM(coord_act_completed_total)) END AS coord_act_comp,
 CASE WHEN 0 > INT(SUM(coord_act_exec_time_avg)) THEN 0 ELSE INT(SUM(coord_act_exec_time_avg)) END AS avg_c_exe_tm,
 CASE WHEN 0 > INT(SUM(request_exec_time_avg)) THEN 0 ELSE INT(SUM(request_exec_time_avg)) END AS avg_r_exe_tm 
FROM db2inst01.scstats_db2statistics WHERE CHAR(DATE(statistics_timestamp)) = CURRENT DATE and SERVICE_SUBCLASS_NAME = 'SLARPTSC' GROUP BY DATE(statistics_timestamp), TIME(statistics_timestamp), SUBSTR(service_subclass_name,1,10) ORDER BY 1,2,3
STAT_DATE  STAT_TIME  SUBCLASS_NAME  CON_ACT_TOP  CON_CONN_TOP AVG_C_EXE_TM    AVG_R_EXE_TM
---------- --------- ------------- ----------- ------------ -------------- ------------ -----------------------------------------------------------------------------------------------------
07/07/2014 00:00:21       SLARPTSC          1            2             0          2                           
07/07/2014 04:00:22       SLARPTSC          1            2             0          3                               
07/07/2014 08:00:24       SLARPTSC          1            2             0          4                              
07/07/2014 12:00:10       SLARPTSC           1           2             0          2

The above SQL shows top concurrent connections and average execution times for service class SLARPTSC grouped by statistics date, time and subclass name.

Example2

SELECT DATE(statistics_timestamp) AS stat_date, TIME(statistics_timestamp) AS stat_time,
 SUBSTR(service_subclass_name,1,10) AS subclass_name, 
CASE WHEN 0 > INT(SUM(act_cpu_time_top)) THEN 0 ELSE INT(SUM(act_cpu_time_top)) END AS act_cpu_time_top,
 CASE WHEN 0 > INT(SUM(act_rows_read_top)) THEN 0 ELSE INT(SUM(act_rows_read_top)) END AS act_rows_read_top,
 CASE WHEN 0 > INT(SUM(coord_act_completed_total)) THEN 0 ELSE INT(SUM(coord_act_completed_total)) END AS coord_act_completed_total, 
CASE WHEN 0 > INT(SUM(concurrent_wlo_top)) THEN 0 ELSE INT(SUM(concurrent_wlo_top)) END AS concurrent_wlo_top, 
CASE WHEN 0 > INT(SUM(coord_act_aborted_total)) THEN 0 ELSE INT(SUM(coord_act_aborted_total)) END AS coord_act_aborted_total
 FROM db2inst01.scstats_db2statistics WHERE service_subclass_name = 'SLARPTSC' AND CHAR(DATE(statistics_timestamp)) = CURRENT DATE GROUP BY DATE(statistics_timestamp), TIME(statistics_timestamp), SUBSTR(service_subclass_name,1,10) ORDER BY 1,2,3
STAT_DATE  STAT_TIME     SUBCLASS_NAME  ACT_CPU_TIME_TOP   ACT_ROWS_READ_TOP 
---------- --------- ------------- ---------------- ----------------- ------------------------- -
07/07/2014 00:00:21       SLARPTSC          147             142                         
07/07/2014 04:00:22       SLARPTSC          266             348                         
07/07/2014 08:00:24       SLARPTSC          266             348                 
07/07/2014 12:00:10       SLARPTSC          266             348                    
Output continued …..
COORD_ACT_COMPLETED_TOTAL           CONCURRENT_WLO_TOP             COORD_ACT_ABORTED_TOTAL
------------------------------------------------------------------------------------------------
                    0                      2                                0
                    0                      2                                0
                    0                      2                                0
                    0                      2                                0

The above SQL provides statistics like top cpu time, top rows read etc grouped by stats date, stats time and subsclass name.

Example 3

SELECT DATE(statistics_timestamp) AS stat_date, TIME(statistics_timestamp) AS stat_time,
 SUBSTR(service_subclass_name,1,10) AS subclass_name, 
CASE WHEN 0 > INT(SUM(coord_act_est_cost_avg)) THEN 0 ELSE INT(SUM(coord_act_est_cost_avg)) END AS coord_act_est_cost_avg,
 CASE WHEN 0 > INT(SUM(coord_act_exec_time_avg)) THEN 0 ELSE INT(SUM(coord_act_exec_time_avg)) END AS coord_act_exec_time_avg, CASE WHEN 0 > INT(SUM(coord_act_interarrival_time_avg)) THEN 0 ELSE INT(SUM(coord_act_interarrival_time_avg)) END AS coord_act_interarrival_time_avg, 
CASE WHEN 0 > INT(SUM(coord_act_lifetime_avg)) THEN 0 ELSE INT(SUM(coord_act_lifetime_avg)) END AS coord_act_lifetime_avg,
 CASE WHEN 0 > INT(SUM(coord_act_lifetime_top)) THEN 0 ELSE INT(SUM(coord_act_lifetime_avg)) END AS coord_act_lifetime_avg, 
CASE WHEN 0 > INT(SUM(coord_act_queue_time_avg)) THEN 0 ELSE INT(SUM(coord_act_queue_time_avg)) END AS coord_act_queue_time_avg 
FROM db2inst01.scstats_db2statistics WHERE service_subclass_name = 'SLAREPORTS' AND CHAR(DATE(statistics_timestamp)) = CURRENT DATE GROUP BY DATE(statistics_timestamp), TIME(statistics_timestamp), SUBSTR(service_subclass_name,1,10) ORDER BY 1,2,3
STAT_DATE  STAT_TIME   SUBCLASS_NAME   COORD_ACT_EST_COST_AVG   COORD_ACT_EXEC_TIME_AVG
 ---------------- ---------------------- ----------------------- ------------------------------- 
07/07/2014 00:00:21     SLARPTSC             6                       0                                                
07/07/2014 04:00:22     SLARPTSC             5                       2                                              
07/07/2014 08:00:24     SLARPTSC             6                       0                                   
07/07/2014 12:00:10     SLARPTSC             5                       3                                    
Output continued……

COORD_ACT_INTERARRIVAL_TIME_AVG COORD_ACT_LIFETIME_AVG  COORD_ACT_QUEUE_TIME_AVG
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         49562                               2                     1                                                            
         28392                               3                     1                                                            
         44872                               3                     1                                                            
         32700                               3                     1

The above SQL provides statistics like average cost estimates, average execution time, average queue time etc for service class SLARPTSC grouped by statistics date, time and service class SLARPTSC.

Example 4

SELECT DATE(statistics_timestamp) AS stat_date, TIME(statistics_timestamp)
 AS stat_time, SUBSTR(service_subclass_name,1,10) AS subclass_name, 
CASE WHEN 0 > INT(SUM(coord_act_rejected_total)) THEN 0 ELSE INT(SUM(coord_act_rejected_total)) END AS coord_act_rejected_total,
 CASE WHEN 0 > INT(SUM(cost_estimate_top)) THEN 0 ELSE INT(SUM(cost_estimate_top)) END AS cost_estimate_top, 
CASE WHEN 0 > INT(SUM(rows_returned_top)) THEN 0 ELSE INT(SUM(rows_returned_top)) END AS rows_returned_top, 
CASE WHEN 0 > INT(SUM(temp_tablespace_top)) THEN 0 ELSE INT(SUM(temp_tablespace_top)) END AS temp_tablespace_top, 
CASE WHEN 0 > INT(SUM(uow_total_time_top)) THEN 0 ELSE INT(SUM(uow_total_time_top)) END AS uow_total_time_top,
 CASE WHEN 0 > INT(SUM(agg_temp_tablespace_top)) THEN 0 ELSE INT(SUM(agg_temp_tablespace_top)) END AS agg_temp_tablespace_top FROM db2inst01.scstats_db2statistics WHERE service_subclass_name = 'SLARPTSC' AND CHAR(DATE(statistics_timestamp)) = CURRENT DATE GROUP BY DATE(statistics_timestamp), TIME(statistics_timestamp), SUBSTR(service_subclass_name,1,10) ORDER BY 1,2,3
STAT_DATE  STAT_TIME     SUBCLASS_NAME   COORD_ACT_REJECTED_TOTAL   COST_ESTIMATE_TOP 
 ---------- --------- ------------- ------------------------ ----------------- ----------------- 
07/07/2014 00:00:21       SLARPTSC               0                        8                
07/07/2014 04:00:22       SLARPTSC               0                        23                
07/07/2014 08:00:24       SLARPTSC               0                        23             
07/07/2014 12:00:10       SLARPTSC               0                        23                
Output continued …..
ROWS_RETURNED_TOP   TEMP_TABLESPACE_TOP   UOW_TOTAL_TIME_TOP   AGG_TEMP_TABLESPACE_TOP
-------------------------------------------------------------------------------------------------
    1                       0                  6                        0
    1                       0                  24                       0
    1                       0                  8                        0
    1                       0                  38                       0

The above SQL provides statistics like temporary tablespace usage, rows returned, unit of work total time, etc for service class SLARPTSC grouped by statistics date, time and service class SLARPTSC.

You are only limited by your imagination when querying these tables for useful data.

Histograms

Histograms are extremely useful. A histogram is a collection of bins collecting discrete ranges of data.
Let us create a histogram template, assign it to the ‘SLARPTSC’ service subclass, and create some histogram reports.
There are the seven kinds of histograms that can be plotted through DB2’s workload management system.

• CoordActQueue Time – Coordinator activity queue time
• CoordActLifetime – Coordinator activity Lifetime
• CoordActInterArrivalTime – Coordinator activity inter arrival time
• CoordActEstCost – Coordinator activity estimated cost
• ReqExecTime – Request execution time
• UOWLifetime – Unit of work lifetime

Create a Histogram for ReqExectime Metrics, Defined in Milliseconds.

First create a template and defining scale using the following SQL:

CREATE HISTOGRAM TEMPLATE REQEXEC HIGH BIN VALUE 90000

Next we associate this scale with a service class using the following command:

ALTER SERVICE CLASS SLARPTSC UNDER REPORTING REQUEST EXECUTETIME HISTOGRAM TEMPLATE REQEXEC

Once the histogram is defined, data will start populating in the HISTOGRAMBIN_DB2STATISTICS table. It is easier to create some views on the HISTOGRAMBIN_DB2STATISTICS table and then query the views.
Suggested views:

CREATE VIEW HISTOGRAMTYPES AS
  SELECT DISTINCT SUBSTR(HISTOGRAM_TYPE,1,24) AS HISTOGRAM_TYPE
  FROM HISTOGRAMBIN_DB2STATISTICS;

CREATE VIEW HISTOGRAMSERVICECLASSES AS
  SELECT DISTINCT SUBSTR(HISTOGRAM_TYPE,1,24) AS HISTOGRAM_TYPE,
         SUBSTR(PARENTSERVICECLASSNAME,1,24) AS SERVICE_SUPERCLASS,
         SUBSTR(SERVICECLASSNAME,1,24) AS SERVICE_SUBCLASS
  FROM HISTOGRAMBIN_DB2STATISTICS AS H,
       SYSCAT.SERVICECLASSES AS S
  WHERE H.SERVICE_CLASS_ID = S.SERVICECLASSID;



CREATE VIEW HISTOGRAMS(HISTOGRAM_TYPE,
                       SERVICE_SUPERCLASS,
                       SERVICE_SUBCLASS,
                       BIN_TOP,
                       NUMBER_IN_BIN) AS
  SELECT DISTINCT SUBSTR(HISTOGRAM_TYPE,1,24) AS HISTOGRAM_TYPE,
         SUBSTR(PARENTSERVICECLASSNAME,1,24) AS SERVICE_SUPERCLASS,
         SUBSTR(SERVICECLASSNAME,1,24) AS SERVICE_SUBCLASS,
         TOP AS BIN_TOP,
         SUM(NUMBER_IN_BIN) AS NUMBER_IN_BIN
  FROM HISTOGRAMBIN_DB2STATISTICS AS H,
       SYSCAT.SERVICECLASSES AS S
  WHERE H.SERVICE_CLASS_ID = S.SERVICECLASSID
  GROUP BY HISTOGRAM_TYPE, PARENTSERVICECLASSNAME, SERVICECLASSNAME, TOP;

SQL to analyze the Request Execution Time histogram for the ‘SLAREPORTS’ service class.

SELECT BIN_TOP, NUMBER_IN_BIN FROM HISTOGRAMS
  WHERE HISTOGRAM_TYPE = 'ReqExecTime'
    AND SERVICE_SUPERCLASS = 'REPORTING'
    AND SERVICE_SUBCLASS = 'SLARPTSC'
  ORDER BY BIN_TOP;

Output of the query:

BIN_TOP              NUMBER_IN_BIN
-------------------- --------------------
                  -1                    0
                   1                  597
                   2                    1
                   3                    0
                   4                    0
                   6                    0
                   9                    0
                  12                    0
                  17                    1
                  ..
                           ..
                  300000               0

This tells us that for the service class ‘SLARPTSC’, there were 597 requests which completed in 1 millisecond, 1 request which completed in 1-2 milliseconds and 1 request in 12-17 milliseconds.

Histograms are a very useful tool to profile the runtime of SQL activities and see if they are really meeting the SLA of the application.

Other uses of WLM objects

    WLM can be used for more than allocation of service resources or monitoring SQL. There are other uses as well.

  1. SQL statements generated from a particular workload or service class can be explained by the db2 designer advisor.
    db2advis -d TESTDB1 -wlm DB2ACTIVITIES sc ‘SLARPTSC’

    The above db2advis statement will take all SQLs that ran under the SLARPTSC service class and recommend indexes, MQTs, etc to tune for the SQL statements, treating them as a workload.

  2. You can create Locking Event monitors targeting specific workloads.

Conclusion

DB2’s Workload Manager gives us an open ended solution to design and build a monitoring and workload management strategy based on specific needs. This design could also be used to implement a chargeback model.
Being a Lego fan, I always like classic kits more than the themed kits of today. Today’s kits have fewer square blocks and more specific blocks that will let you create only one model. I like to compare DB2’s workload manager to the classic legos, which came with more square blocks and were more open ended kits that could test your imagination to make anything you could imagine.
Take the WLM for a test drive; I am sure you will find multiple uses for it.

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Pingback: DB2 Workload Manager (WLM) as a Monitoring Solution – Understanding WLM (Part1 of 3) - db2commerce.com