Looking at Sort Memory Performance on BLU
Performance tuning of BLU databases is an evolving topic at this time. For BLU databases, sort memory cannot be tuned by STMM, and so must be manually tuned.
BLU also makes extensive use of sort memory. It is a bit of a misnomer at this point. It is more of a working memory area that is used for hashes, grouping, aggregations and more. Each query may easily allocate more than one sortheap. Each operator in the explain plan can have its own sort heap if needed.
Sort Memory Parameters
With BLU at this time, sort memory areas cannot be automatically tuned. However, some parameters, like
SHEAPTHRES_SHR, specify a maximum. It is a good idea to see how close you are getting to that maximum.
To see your setting for sort heap parameters, you can use this syntax:
$ db2 get db cfg for SAMPLE |grep -i sort Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = 3400000 Sort list heap (4KB) (SORTHEAP) = 200000
Yes, that’s over 3 million pages.
Shared Sort Memory
To see the top amount of memory used for all
SORTHEAPs at one time, I can use:
$ db2 "select sort_shrheap_top from table(mon_get_database(-2))" SORT_SHRHEAP_TOP -------------------- 2711847 1 record(s) selected.
This tells me that given my current settings, I have quite a bit of overhead in
SHEAPTHRES_SHR, especially since this database has been continually up for over 70 days.
The general guideline for
SHEAPTHRES_SHR in BLU databases is that we want to aim for 1/2 to 1/3 of the memory available to DB2. The BLU implementation I am working with here is a small one – with just 8 CPUs and 64 GB of memory. I also have two instances on this server, meaning that after memory for the OS (figure 20% for small systems), and memory for the other smaller instance, I’m allocating about 42.5 GB or 11,161,836 4K pages for this instance/database. With those numbers, my SHEAPTHRES_SHR is right about 30% of the available memory. Which is particularly interesting, because I did not intentionally arrive at that amount, but have rather increased it several times as errors occurred during pre-go-live testing on this database.
While my overall maximum for sort memory is probably about right, I also need to see if sorts are overflowing and needing more than the 200,000 pages in a single sortheap. I can start by looking at the high water mark for sortheap. I can do that with this query:
$ db2 "select SORT_CONSUMER_SHRHEAP_TOP from table(mon_get_database(-2))" SORT_CONSUMER_SHRHEAP_TOP ------------------------- 200000 1 record(s) selected.
In this case, we can see that the high water mark matches my
SORTHEAP value, indicating that in at least one case, something is using the entire amount of memory available to it. With that in mind, I need to understand how often my
SORTHEAP is being overflowed. This SQL works for that:
WITH SORT_CONSUMERS(TOTAL_SORT_CONSUMERS, TOTAL_SORT_CONSUMER_OVERFLOWS) AS (SELECT (TOTAL_SORTS + TOTAL_HASH_JOINS + TOTAL_HASH_GRPBYS + TOTAL_OLAP_FUNCS + TOTAL_COL_VECTOR_CONSUMERS) AS TOTAL_SORT_CONSUMERS, (SORT_OVERFLOWS + HASH_JOIN_OVERFLOWS + HASH_GRPBY_OVERFLOWS + OLAP_FUNC_OVERFLOWS) AS TOTAL_SORT_CONSUMER_OVERFLOWS FROM TABLE(MON_GET_DATABASE(-2)) AS T) SELECT TOTAL_SORT_CONSUMER_OVERFLOWS, TOTAL_SORT_CONSUMERS, CASE WHEN TOTAL_SORT_CONSUMERS > 0 THEN DEC((FLOAT(TOTAL_SORT_CONSUMER_OVERFLOWS)/ FLOAT(TOTAL_SORT_CONSUMERS)) * 100, 5, 2) ELSE NULL END AS PCT_SORT_CONSUMER_OVERFLOWS FROM SORT_CONSUMERS WITH UR; TOTAL_SORT_CONSUMER_OVERFLOWS TOTAL_SORT_CONSUMERS PCT_SORT_CONSUMER_OVERFLOWS ----------------------------- -------------------- --------------------------- 3945 3897812 0.10 1 record(s) selected.
Only 0.10% of my sorts are overflowing – generally not bad. However, since I know that this system has extremely low concurrency, and it’s not coming close to using my shared sort memory, and I’m not yet experiencing memory pressure, I’m going to increase my SORTHEAP.
These same analyses can be done for non-BLU databases, but if STMM is managing sort for you in those situations, you may not have to tune these areas. In BLU databases it is critical to do these kinds of analyses to properly tune sort memory.
Note: I increased my
SORTHEAP to 300000 pages, and still saw the same kinds of numbers a week later – overflows about 0.10% of the time, and the same exact HWM for shared sort. I’ll be increasing
SORTHEAP again and seeing how things go.
A big thanks to Scott Maberry – much of the SQL in this blog entry is derived from SQL he shared with me.