DB2 Internals: Simple Query Processing
Edited 11/13/2014 to be more specific about coordinator agents.
This is the first in a series of blog entries talking about DB2 internals. I have no intimate knowledge from IBM of how DB2 works internally, just years of watching presentations about the internals and doing my best to understand what DB2 is doing to inform my actions and decisions as a DBA. Much of this information is in the depths of the Knowledge Center or is generally unpublished outside of an IBM class or IBM conference presentation.
How a Simple Query is Processed
The goal here is to show how processing works for a simple query using threads, memory area, and disk access. There are things at many levels that are not shown here, and lesser details left out, but this is basically how it happens. This covers a DB2 version of 9.5 or later, using row organized tables – a single partition database with INTRA_PARALLEL turned off.
In the images used here, circles represent threads or processes, while grey rectangles represent memory areas. Disks are represented by cylinders. Red arrows indicate the action described by each step. Click on any of the images for a larger version.
Here is how DB2 would handle a simple query, such as
SELECT * FROM T1
Query is Sent to DB2 Server
First, the client sends the query to the server. A coordinating agent is either allocated from a pool or created to serve each application when it connects. There is only one coordinator agent for each query. Different coordinator agents exist to serve other queries on other connections. Each connection has only one coordinator agent.
Access Plan is Compiled
The coordinator agent checks to see if there is an access plan for this query in the package cache. If there is a plan, the coordinator agent can use it. If there is not a plan, an access plan is compiled and placed into the package cache.
Data is Retrieved
There are three main ways that DB2 can get the extents it needs with the data to fulfill the query. All data that is not LOB data moves through the bufferpools.
The first and fastest way DB2 can get the data is if the extents it needs are already in the bufferpools. If the extents needed are there, the coordinator agent can read the data directly from memory.
If the extents needed are not in the bufferpools, then the coordinator agent must go directly to disk, and read the extents it needs into the bufferpool from disk. This is much slower than simply reading data from the bufferpools, as the coordinator agent must wait for the data to be read from disk. Disk caching may make this access faster.
To avoid the slower synchronous disk access method, the prefetchers may be triggered to read data that is predicted to be needed into the bufferpools before the coordinator agent needs it. This is called Asynchronous I/O.
Depending on the isolation level, locks may be acquired during the processing of any query, including selects. If locks are required, then data about those locks is written to the lock list. When locks are released depends on the isolation level. All locks are released on a COMMIT, which is outside of the scope of this article.
Trying to cover every single detail would be overwhelming, but this provides a good idea of what is going on. I would love to hear from readers in the comments below about this type of blog entry, as I have many more planned.