SQL Tip: COALESCE
I haven’t generally been known for my SQL tips. I can find my way around SQL decently enough, but for years, I didn’t do much of it. I’ve used it more and more as the years have gone by as I have moved out of a stictly physical/system DBA role and into a more mixed role where I interact and collaborate with our development team. I’ve also developed some of my own tools, and that has taught me worlds about more complicated SQL.
Some of my SQL tips are going to seem pretty darn basic, but the truth is that the most basic topics are my most popular articles, and I might have a tidbit here and there that some of my readers aren’t familiar with.
This first tip is centered on the COALESCE. I actually first became aware of COALESCE when I took an Oracle administration class. We had to use a COALESCE function in class, and I thought there must be something like it in DB2.
COALESCE vs. CASE
The whole point of COALESCE is to help you deal with a situation where you may want to return something different if a value is NULL. This is pretty common, and while you can do the same functionality with a CASE statement, COALESCE is shorter and simpler.
Basically, the whole point of COALESCE is that it is a function that returns the first value that is not NULL. You can pass in as many (compatible) arguments as you like – it can handle any data types. Many times, the last argument passed in is a string literal or constant (number).
How can this help you? Let’s take a look at an example. In my case, I needed to return a value from a table, but if that value happened to be NULL, I needed instead to return a default value of -20. In my case, the CASE statement looked like this:
case when stgfilter is null then -20 else stgfilter end
Not complicated, but still several lines to get what I need. When using COALESCE to do the same logic, I can use:
Logically, it feels like COALESCE has less work to do, so should perform faster. But I don’t know if the DB2 optimizer really knows the difference. When I did a little experiment with my two methods on tiny little <200 row tables, I got the following two explain plans:
Access Plan: ----------- Total Cost: 15.1685 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 1 >^NLJOIN ( 2) 15.1683 2 /--------+---------\ 1 1 >NLJOIN IXSCAN ( 3) ( 7) 15.1579 0.0103311 2 0 /-------+-------\ | 1 1 18 IXSCAN FETCH INDEX: WSCOMUSR ( 4) ( 5) X_STR_MAP_IX01 0.0145029 15.1434 Q5 0 2 | /---+----\ 110 1 26359 INDEX: SYSIBM IXSCAN TABLE: WSCOMUSR SQL120207213008270 ( 6) STORECENT Q1 7.57994 Q3 1 | 26359 INDEX: WSCOMUSR I0000345 Q3
Access Plan: ----------- Total Cost: 15.1685 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 1 >^NLJOIN ( 2) 15.1683 2 /--------+---------\ 1 1 >NLJOIN IXSCAN ( 3) ( 7) 15.1579 0.0103311 2 0 /-------+-------\ | 1 1 18 IXSCAN FETCH INDEX: WSCOMUSR ( 4) ( 5) X_STR_MAP_IX01 0.0145029 15.1434 Q5 0 2 | /---+----\ 110 1 26359 INDEX: SYSIBM IXSCAN TABLE: WSCOMUSR SQL120207213008270 ( 6) STORECENT Q1 7.57994 Q3 1 | 26359 INDEX: WSCOMUSR I0000345
No, I did not copy and paste wrong – those are two exactly identical explain plans. And when I checked, the re-written SQL did not show one method had been converted to the other.
I can’t exactly exrapolate these results to more complex situations, of course, but I still thought it was interesting.
I would love it if any readers have any knowledge about the performance differences of CASE vs. COALESCE that they’d be willing to share in the comments.