Admission Control

On this page Carat arrow pointing down

CockroachDB supports an admission control system to maintain cluster performance and availability when some nodes experience high load. When admission control is enabled, CockroachDB sorts request and response operations into work queues by priority, giving preference to higher priority operations. Internal operations critical to node health, like node liveness heartbeats, are high priority. The admission control system also prioritizes transactions that hold locks, to reduce contention and release locks earlier.

How admission control works

At a high level, the admission control system works by queueing requests to use the following system resources:

  • CPU
  • Storage IO (writes to disk)

For CPU, different types of usage are queued differently based on priority to allow important work to make progress even under high CPU utilization.

For storage IO, the goal is to prevent the storage layer's log-structured merge tree (LSM) from experiencing high read amplification, which slows down reads, while also maintaining the ability to absorb bursts of writes.

Admission control works on a per-node basis, since even though a large CockroachDB cluster may be well-provisioned as a whole, individual nodes are stateful and may experience performance hot spots.

For more details about how the admission control system works, see:

Use cases for admission control

A well-provisioned CockroachDB cluster may still encounter performance bottlenecks at the node level, as stateful nodes can develop hot spots that last until the cluster rebalances itself. When hot spots occur, they should not cause failures or degraded performance for important work.

This is particularly important for CockroachDB Standard and CockroachDB Basic, where one user tenant cluster experiencing high load should not degrade the performance or availability of a different, isolated tenant cluster running on the same host.

Admission control can help if your cluster has degraded performance due to the following types of node overload scenarios:

  • The node has more than 32 runnable goroutines per CPU, visible in the Runnable goroutines per CPU graph in the Overload dashboard.
  • The node has a high amount of overload in the Pebble LSM tree, visible in the IO Overload graph in the Overload dashboard.
  • The node has high CPU usage, visible in the CPU Utilization graph in the Overload dashboard.
  • The node is experiencing out-of-memory errors, visible in the Memory Usage graph in the Hardware dashboard. Even though admission control does not explicitly target controlling memory usage, it can reduce memory usage as a side effect of delaying the start of operation execution when the CPU is overloaded.

Operations subject to admission control

Almost all database operations that use CPU or perform storage IO are controlled by the admission control system. From a user's perspective, specific operations that are affected by admission control include:

The following operations are not subject to admission control:

  • SQL writes are not subject to admission control on follower replicas by default, unless those writes occur in transactions that are subject to a Quality of Service (QoS) level as described in Set quality of service level for a session. In order for writes on follower replicas to be subject to admission control, the setting default_transaction_quality_of_service=background must be used.
Note:

Admission control is beneficial when overall cluster health is good but some nodes are experiencing overload. If you see these overload scenarios on many nodes in the cluster, that typically means the cluster needs more resources.

Enable and disable admission control

Admission control is enabled by default. To enable or disable admission control, use the following cluster settings:

  • admission.kv.enabled for work performed by the KV layer.
  • admission.sql_kv_response.enabled for work performed in the SQL layer when receiving KV responses.
  • admission.sql_sql_response.enabled for work performed in the SQL layer when receiving DistSQL responses.
  • kvadmission.store.snapshot_ingest_bandwidth_control.enabled to optionally limit the disk impact of ingesting snapshots on a node. This cluster setting is in Preview.
  • kvadmission.store.provisioned_bandwidth to optionally limit the disk bandwidth capacity of stores on the cluster. Disk bandwidth admission control paces background disk writes to keep disk bandwidth within its provisioned bandwidth. This cluster setting is in Preview.

When you enable or disable admission control settings for one layer, Cockroach Labs recommends that you enable or disable them for all layers.

Work queues and ordering

When admission control is enabled, request and response operations are sorted into work queues where the operations are organized by priority and transaction start time.

Higher priority operations are processed first. The criteria for determining higher and lower priority operations is different at each processing layer, and is determined by the CPU and storage I/O of the operation. Write operations in the KV storage layer in particular are often the cause of performance bottlenecks, and admission control prevents the Pebble storage engine from experiencing high read amplification. Critical cluster operations like node heartbeats are processed as high priority, as are transactions that hold locks in order to avoid contention and release locks earlier.

The transaction start time is used within the priority queue and gives preference to operations with earlier transaction start times. For example, within the high priority queue operations with an earlier transaction start time are processed first.

Set quality of service level for a session

In an overload scenario where CockroachDB cannot service all requests, you can identify which requests should be prioritized. This is often referred to as quality of service (QoS). Admission control queues work throughout the system. To set the quality of service level on the admission control queues on behalf of SQL requests submitted in a session, use the default_transaction_quality_of_service session variable. The valid values are critical, background, and regular. Admission control must be enabled for this setting to have an effect.

To increase the priority of subsequent SQL requests, run:

icon/buttons/copy
SET default_transaction_quality_of_service=critical;

To decrease the priority of subsequent SQL requests, run:

icon/buttons/copy
SET default_transaction_quality_of_service=background;

To reset the priority to the default session setting (in between background and critical), run:

icon/buttons/copy
SET default_transaction_quality_of_service=regular;

The quality of service for COPY statements is configured separately with the copy_transaction_quality_of_service session variable, which defaults to background.

To increase the priority of subsequent COPY statements, run:

icon/buttons/copy
SET copy_transaction_quality_of_service=critical;

Set quality of service level for a transaction

To set the quality of service level for a single transaction, set the default_transaction_quality_of_service and/or copy_transaction_quality_of_service session variable for just that transaction using the SET LOCAL statement inside a BEGIN ... COMMIT block, as shown in the following example. The valid values are critical, background, and regular.

icon/buttons/copy
BEGIN;
SET LOCAL default_transaction_quality_of_service = 'regular'; -- Edit transaction QoS to desired level
SET LOCAL copy_transaction_quality_of_service = 'regular'; -- Edit COPY QoS to desired level
-- Statements to run in this transaction go here
COMMIT;

Known limitations

Admission control works on the level of each node, not at the cluster level. The admission control system queues requests until the operations are processed or the request exceeds the timeout value (for example by using SET statement_timeout). If you specify aggressive timeout values, the system may operate correctly but have low throughput as the operations exceed the timeout value while only completing part of the work. There is no mechanism for preemptively rejecting requests when the work queues are long.

Organizing operations by priority can mean that higher priority operations consume all the available resources while lower priority operations remain in the queue until the operation times out.

Considerations

Client connections are not managed by the admission control subsystem. Too many connections per gateway node can also lead to cluster overload.

To control the maximum number of non-superuser (root user or other admin role) connections a gateway node can have open at one time, use the server.max_connections_per_gateway cluster setting. If a new non-superuser connection would exceed this limit, the error message "sorry, too many clients already" is returned, along with error code 53300.

Observe admission control performance

The DB Console Overload dashboard shows metrics related to the performance of the admission control system.

See also


Yes No
On this page

Yes No