Relational databases are great for transactional workloads. But things can get messy when multiple transactions start trying to access the same data at the same time. Luckily, in many SQL databases there’s a solution for that: SELECT FOR UPDATE
.
What is SELECT FOR UPDATE?
SELECT FOR UPDATE
is a SQL command that’s useful in the context of transactional workloads. It allows you to “lock” the rows returned by a SELECT
query until the entire transaction that query is part of has been committed. Other transactions attempting to access those rows are placed into a time-based queue to wait, and are executed chronologically after the first transaction is completed.
This is useful because it prevents the thrashing and unnecessary transaction retries that would otherwise occur when multiple transactions are attempting to read those same rows. Any time multiple transactions are likely to be working with the same rows at roughly the same time, SELECT FOR UPDATE
can be used to increase throughput and decrease tail latency (compared to what you would see without using it).
In other words: SELECT FOR UPDATE
makes contended transactions process more smoothly (which generally also means they process more quickly and efficiently).
When to use SELECT FOR UPDATE
In general, SELECT FOR UPDATE
is useful for any transactional workload where multiple transactions might attempt to access the same rows at the same time.
However, different ‘flavors’ of SQL handle SELECT FOR UPDATE
somewhat differently, and some do not support it at all. For example, MySQL supports SELECT FOR UPDATE, but SQLite does not.
Often, the reason for this is related to the way different systems handle transactions more broadly. For example, SQLite does not need SELECT FOR UPDATE
because initiating a transaction locks the entire database. This allows SQLite databases to offer serializable transactional isolation, the highest level of isolation, for ironclad consistency. However, locking the entire database during a transaction that might only affect a few rows has obvious negative implications for performance, particularly at scale.
However, choosing between performance and consistency is not necessary! CockroachDB, for example, offers serializable isolation, but it does not have to lock the entire database at the beginning of a transaction to make that happen. SELECT FOR UPDATE
can be used to maximize database performance in the event of concurrent transactions working on the same rows, and the end result (in the case of CockroachDB) is still a database with serializable isolation.
Different SQL databases handle transaction isolation, and thus SELECT FOR UPDATE
, differently, so it’s important to be familiar with the options and defaults for the system you’re using.
Example: SELECT FOR UPDATE in action
Let’s take a look at how SELECT FOR UPDATE
works. We’ll use CockroachDB syntax and parameters here, but the syntax for other SQL databases that support SELECT FOR UPDATE
will be similar.
Imagine we’re working with a database that includes the following table kv
:
A complete transaction that uses SELECT FOR UPDATE
on that table could look like this:
BEGIN;
SELECT * FROM kv WHERE k = 1 FOR UPDATE;
UPDATE kv SET v = v + 5 WHERE k = 1;
COMMIT;
Working line by line through the statement above:
The first line,
BEGIN
, initiates the transaction.The second line, the
SELECT
statement, identifies the rows that will be impacted and locks them. In this case, it’s a single row: the first row of the table.The third line makes an update to the row in question. In this case, adding 5 to the value in the
v
column of the row.The fourth line,
COMMIT;
commits the transaction.
If we were to run this transaction on our example table, the result would be this:
What’s important here, though, is that the SELECT … FOR UPDATE
line (line 2) locked the row that the transaction updated. If another transaction (let’s call it Tx 2) hit the database attempting to operate on the same row while this transaction (Tx 1) was processing, Tx 2 would be added to the queue for processing after Tx 1 commits, rather than beginning to execute, failing, and having to retry because Tx 1 changed one of the values Tx 2 was accessing while it was processing.
SELECT FOR UPDATE parameters
Again, different database systems allow for different parameters and modifications of SELECT
statements.
A common one, for example, is SELECT … FOR SHARE
, which provides a weaker form of row-locking in some database systems. In PostgreSQL, SELECT … FOR UPDATE
completely locks the relevant rows, whereas SELECT … FOR SHARE
locks the relevant rows only for updates and deletes.
In contrast, because CockroachDB always guarantees serializable isolation and there are no “weaker” locking levels, SELECT … FOR SHARE
functions identically to SELECT … FOR UPDATE
. The FOR SHARE
syntax is supported only for Postgres compatibility.
Another common parameter is NOWAIT
, which returns an error immediately if a transaction is not able to immediately lock a row. In SQL syntax, NOWAIT
appears directly after FOR UPDATE
, like so:
SELECT * FROM kv WHERE k = 1 FOR UPDATE NOWAIT;
SKIP LOCKED
is also a parameter supported by some databases that allows waiting transactions to skip locked rows temporarily so that the “hold” on those rows doesn’t slow the processing of elements of the transaction impacting non-locked rows. CockroachDB doesn’t currently support this, in part because it uses multiversion concurrency control. This forum thread has some excellent information for achieving the same goals without using SKIP LOCKED
.