Admission Control

On this page

CockroachDB supports an admission control system to maintain cluster performance and availability when some nodes experience high load. When admission control is enabled, CockroachDB sorts request and response operations into work queues by priority, giving preference to higher priority operations. Internal operations critical to node health, like node liveness heartbeats, are high priority. The admission control system also prioritizes transactions that hold locks, to reduce contention and release locks earlier.

How admission control works

At a high level, the admission control system works by queueing requests to use the following system resources:

CPU
Storage IO (writes to disk)

For CPU, different types of usage are queued differently based on priority to allow important work to make progress even under high CPU utilization.

For storage IO, the goal is to prevent the storage layer's log-structured merge tree (LSM) from experiencing high read amplification, which slows down reads, while also maintaining the ability to absorb bursts of writes.

Admission control works on a per-node basis, since even though a large CockroachDB cluster may be well-provisioned as a whole, individual nodes are stateful and may experience performance hotspots.

For more details about how the admission control system works, see:

The Admission Control tech note.
The blog post Here's how CockroachDB keeps your database from collapsing under load.

Use cases for admission control

A well-provisioned CockroachDB cluster may still encounter performance bottlenecks at the node level, as stateful nodes can develop hotspots that last until the cluster rebalances itself. When hotspots occur, they should not cause failures or degraded performance for important work.

This is particularly important for CockroachDB Standard and CockroachDB Basic, where one user tenant cluster experiencing high load should not degrade the performance or availability of a different, isolated tenant cluster running on the same host.

Admission control can help if your cluster has degraded performance due to the following types of node overload scenarios:

The node has more than 32 runnable goroutines per CPU, visible in the Runnable goroutines per CPU graph in the Overload dashboard.
The node has a high amount of overload in the Pebble LSM tree, visible in the IO Overload graph in the Overload dashboard.
The node has high CPU usage, visible in the CPU Utilization graph in the Overload dashboard.
The node is experiencing out-of-memory errors, visible in the Memory Usage graph in the Hardware dashboard. Even though admission control does not explicitly target controlling memory usage, it can reduce memory usage as a side effect of delaying the start of operation execution when the CPU is overloaded.

Operations subject to admission control

Almost all database operations that use CPU or perform storage IO are controlled by the admission control system. From a user's perspective, specific operations that are affected by admission control include:

General SQL queries have their CPU usage subject to admission control, as well as storage IO for writes to leaseholder replicas and follower replicas.
Bulk data imports.
COPY statements.
Deletes (including deletes initiated by row-level TTL jobs; the selection queries performed by TTL jobs are also subject to CPU admission control).
Backups.
Schema changes, including index and column backfills (on both the leaseholder replica and follower replicas).
Follower replication work.
Raft log entries being written to disk.
Changefeeds.
Intent resolution.

Note:

Admission control is beneficial when overall cluster health is good but some nodes are experiencing overload. If you see these overload scenarios on many nodes in the cluster, that typically means the cluster needs more resources.

Replication admission control

The admission control subsystem paces all work done at the Replication Layer to avoid cluster overload. This includes user-facing writes from SQL statements, as well as background (elastic) replication work.

New in v25.1: The pacing of catchup writes is controlled at the replication layer to avoid overloading slow or newly restarted nodes with replication flows. Note that this pacing does not slow down user-facing SQL writes; it only ensures there are fewer impacts from background operations.

At a high level, replication admission control works by:

Pacing regular SQL writes at the rate of replica quorum. (New in v25.1)
Pacing background (elastic) replication at the rate of the slowest replica.

This has the following effects:

Does not overload slow/restarted nodes with replication flows. (New in v25.1)
Isolates performance between regular and elastic traffic.
Paces regular writes at quorum speed. (New in v25.1)
Paces elastic writes at the slowest replica's speed.

For example, prior to CockroachDB v25.1, when a leader and follower replica were disconnected from each other due to a node going away and coming back, once the follower came back the leader would slam the follower with any Raft entries it had missed. In v25.1 and later, the leader paces the entries it sends to the follower. The result is that baseline cluster QPS (queries per second) and latency should not change substantially during perturbations such as node restarts.

To monitor the behavior of replication admission control, refer to UI Overload Dashboard > Replication Admission Control.

Enable and disable admission control

Admission control is enabled by default. To enable or disable admission control, use the following cluster settings:

admission.kv.enabled for work performed by the KV layer.
admission.sql_kv_response.enabled for work performed in the SQL layer when receiving KV responses.
admission.sql_sql_response.enabled for work performed in the SQL layer when receiving DistSQL responses.
kvadmission.store.snapshot_ingest_bandwidth_control.enabled to optionally limit the disk impact of ingesting snapshots on a node. This cluster setting is in Preview.
kvadmission.store.provisioned_bandwidth to optionally limit the disk bandwidth capacity of stores on the cluster. Disk bandwidth admission control paces background disk writes to keep disk bandwidth within its provisioned bandwidth. This cluster setting is in Preview.

When you enable or disable admission control settings for one layer, Cockroach Labs recommends that you enable or disable them for all layers.

Work queues and ordering

When admission control is enabled, request and response operations are sorted into work queues where the operations are organized by priority and transaction start time.

Higher priority operations are processed first. The criteria for determining higher and lower priority operations is different at each processing layer, and is determined by the CPU and storage I/O of the operation. Write operations in the KV storage layer in particular are often the cause of performance bottlenecks, and admission control prevents the Pebble storage engine from experiencing high read amplification. Critical cluster operations like node heartbeats are processed as high priority, as are transactions that hold locks in order to avoid contention and release locks earlier.

The transaction start time is used within the priority queue and gives preference to operations with earlier transaction start times. For example, within the high priority queue operations with an earlier transaction start time are processed first.

Set quality of service level for a session

In an overload scenario where CockroachDB cannot service all requests, you can identify which requests should be prioritized. This is often referred to as quality of service (QoS). Admission control queues work throughout the system. To set the quality of service level on the admission control queues on behalf of SQL requests submitted in a session, use the default_transaction_quality_of_service session variable. The valid values are critical, background, and regular. Admission control must be enabled for this setting to have an effect.

To increase the priority of subsequent SQL requests, run:

SET default_transaction_quality_of_service=critical;

To decrease the priority of subsequent SQL requests, run:

SET default_transaction_quality_of_service=background;

To reset the priority to the default session setting (in between background and critical), run:

SET default_transaction_quality_of_service=regular;

The quality of service for COPY statements is configured separately with the copy_transaction_quality_of_service session variable, which defaults to background.

To increase the priority of subsequent COPY statements, run:

SET copy_transaction_quality_of_service=critical;

Set quality of service level for a transaction

To set the quality of service level for a single transaction, set the default_transaction_quality_of_service and/or copy_transaction_quality_of_service session variable for just that transaction using the SET LOCAL statement inside a BEGIN ... COMMIT block, as shown in the following example. The valid values are critical, background, and regular.

BEGIN;
SET LOCAL default_transaction_quality_of_service = 'regular'; -- Edit transaction QoS to desired level
SET LOCAL copy_transaction_quality_of_service = 'regular'; -- Edit COPY QoS to desired level
-- Statements to run in this transaction go here
COMMIT;

Known limitations

Admission control works on the level of each node, not at the cluster level. The admission control system queues requests until the operations are processed or the request exceeds the timeout value (for example by using SET statement_timeout). If you specify aggressive timeout values, the system may operate correctly but have low throughput as the operations exceed the timeout value while only completing part of the work. There is no mechanism for preemptively rejecting requests when the work queues are long.

Organizing operations by priority can mean that higher priority operations consume all the available resources while lower priority operations remain in the queue until the operation times out.

Considerations

Client connections are not managed by the admission control subsystem. Too many connections per gateway node can also lead to cluster overload.

To control the maximum number of non-superuser (root user or other admin role) connections a gateway node can have open at one time, use the server.max_connections_per_gateway cluster setting. If a new non-superuser connection would exceed this limit, the error message "sorry, too many clients already" is returned, along with error code 53300.

Observe admission control performance

The DB Console Overload dashboard shows metrics related to the performance of the admission control system.

Pricing

Contact us

Sign In