DB2 Basics: Isolation Levels and Concurrency Phenomena
Isolation levels are such a critical concept to maximizing concurrency for transaction processing databases where ACID is so critical.
What is ACID?
ACID stands for Atomicity, Consistency, Isolation, Durability. This is a set of concepts that applies to any transaction processing system, whether it is a Relational Database Management System (like DB2) or not. ACID is one of the big things that sets relational databases apart from NoSQL and alternate data organization software.
Atomicity means that a transaction either succeeds or fails as a whole. Nearly every transaction involves more than one table. This means that each transaction as defined by the application involves more than one change. Atomicity means that either all changes related to a single transaction occur or all changes related to a single transaction fail.
The classic example used for this is a transfer from one bank account to another. If you consider the transfer as a single transaction, we need to make sure that both the debit from the source account and the credit to the target account succeed or fail as a whole. A bank wouldn’t want the debit to fail due to insufficient funds, and then the credit to the target account to still go through. We also wouldn’t want the source acount to be debited and then due to some error the target account never receive the credit. Database management systems use the concept of a transaction to make sure that if there is a failure in the credit the debit would also fail.
Consistency means that a transaction cannot succeed if it violates any rules defined at the database level – these could be unique constraints, check constraints, triggers or any other logic that is defined at the database level. If a violation to one of these rules is detected, the transaction as a whole must fail. Not all application logic is defined at the database level, so this applies only to defined logic at the database level
Isolation refers to the concept that the action on the data must be the same if two transactions happen concurrently as it would be if they happened serially. That is, that the data is changed in the same way if two transactions happen at the same time as if they happen one after another.
For an example here, let’s pretend we have 10 widgets in inventory. Transaction A removes 2 widgets from inventory, leaving 8. Transaction B then removes 3 widgets leaving 5 in inventory. Then Transaction A fails and updates inventory back to the value it was before the update – 10 – but this is not correct because of the subsequent update of transaction B. This would violate the Isolation rule if it were allowed to happen.
Durability means that when a user is told that a transaction is successful, the results of that transaction are externalized such that even if the server were to be rebooted, the data would reflect that. DB2 uses transaction logs to ensure this is true – not returning a successful commit until a transaction has been externalized to disk for the transaction logs. This doesn’t mean that data on disk cannot be corrupted – simply that if the disk subsystem is working, transactions are not lost by system availability issues.
To understand DB2’s isolation levels, we have to first understand several concurrency phenomena that can occur and then which isolation levels protect us from them. To describe them, an analogy of reserving seats at a concert works well. These examples are simplified because most modern reservation systems solve these issues at the application level.
Access to Uncommitted Data
Accessing uncommitted data means that transaction A reads data that transaction B has updated, but has not committed.
Let’s say that Mike is booking a ticket to see his favorite band, Nickelback, in concert. He selects the last two seats available. Meanwhile, Ian tries to buy a ticket, but the concert is sold out. Mike realizes that he cannot get a babysitter, and the transaction is cancelled. On the night of the concert, there are two empty seats that could have been full if Ian hadn’t incorrectly seen that the concert was sold out.
Accessing uncommitted data is rarely what is wanted in any transaction. It may be acceptable for reporting, especially when reporting is on largely unchanging historical data.
A non-repeatable read occurs when the same query is executed twice in the same transaction, but receives different values.
Let’s say this time that Mike is looking for tickets to a Taylor Swift concert to enjoy with his two daughters. He looks online and sees that there are three seats left. Meanwhile, Ian completes a transaction to buy those two seats to share the concert with his daughter. When Mike tries to check out, he gets an error that there is only one seat remaining, when he was clearly just told that they were available.
Let’s say Mike is buying tickets to Macklemore and Ryan Lewis this time. He goes to see if there are any seats left for less than $40. There aren’t any. Meanwhile, the concert venue decides to reduce the price of a number of tickets. Mike increases his limit to $80. This time, he finds a seat for $38.
A phantom read is similar to a non-repeatable read, but in this case instead of different values, you see additional rows that did not match the criteria before.
DB2’s Isolation Levels
There are also standards across relational database management systems. DB2’s map to the standards. I’m including here what I think are the equivalents on other systems:
I would love to hear comments on this.
Repeatable Read (RR) is the isolation level that allows for the least concurrency, but also eliminates all of the described concurrency phenomena. This is the safest in terms of data integrity, but the least safe in terms of performance. With RR, if a table scan occurs, then the entire table is locked. This is rarely used in my experience. The standard and many RDBMSs call this serializable.
Read Stability (RS) allows phantom reads, but this is a concurrency phenomenon that many applications are built to handle properly.
Cursor Stability (CS) is the default if nothing else is specified. Again, many applications are built to handle this properly. Cursor Stability will not encounter dirty reads, but may encounter phantom reads or non-repeatable reads.
Uncommitted Read (UR) is what I specify every time I run a query manually. It acquires essentially no locks, and is the least likely to impact performance. Most of my queries are administrative in nature, and do not access data that could be dirty or care about it if any of the concurrency phenomena are encountered.
CUR_COMMIT (Currently Committed)
In DB2 9.5/9.7, DB2 added and made the default a database level parameter called CUR_COMMIT. This can also be set by the application at the connection level for most applications. This changes behavior a bit, but the behavior still meets the ANSI standard definitions. The details of this change are in a great developerWorks article by Roger Sanders, and are beyond the scope of this blog article. Nearly every DB2 LUW database these days uses CS isolation level with CUR_COMMIT enabled – mostly because this is the default.
But What Do the Isolation Levels Do?
The isolation levels define which objects (usually rows) are locked and how long that lock is held for. Locking is a topic for another article. Understanding these concepts is crucial for passing the Family Fundamentals test, and should be basic knowledge for every developer. It is a very dry and detailed topic, and I hear it is one of the most frequently failed sections of the certification tests. There is an excellent tutorial available on this topic.