Understand the Resource Usage of your CockroachDB Standard Cluster

On this page Carat arrow pointing down
Note:

CockroachDB Standard, our new, enterprise-ready plan, is currently in Preview.

This page describes how to understand your CockroachDB Standard cluster's Request Unit (RU) consumption and tune your workload to balance performance and costs.

For information on planning your cluster configuration, refer to Plan a Standard Cluster for a Standard cluster.

Understand resource consumption

CockroachDB Standard clusters consume three kinds of resources:

  • SQL CPU
  • Network egress
  • Storage layer I/O

A CockroachDB Standard cluster is divided into a SQL layer and a storage layer that run in separate processes. The SQL layer receives and runs your SQL queries and background jobs. When the SQL layer needs to read or write data rows, it calls the storage layer, which manages a replicated, transactional row store that is distributed across many machines.

SQL CPU is the CPU consumed by SQL processes (not storage processes) and is converted to Request Units using this equivalency: 1 RU = 3 milliseconds SQL CPU.

Network egress measures the number of bytes that are returned from a SQL process to the calling client. It also includes any bytes sent by bulk operations like EXPORT or changefeeds. It is converted to Request Units using this equivalency: 1 RU = 1 KiB network egress.

Storage layer I/O includes the read and write requests sent by the SQL layer to the storage layer. These operations are sent in batches containing any number of requests. Requests can have a payload containing any number of bytes. Write operations are replicated to multiple storage processes (3 by default), with each replica counted as a separate write operation. Storage layer I/O is converted to Request Units using these equivalencies:

  • 1 RU = 2 storage read batches
  • 1 RU = 8 storage read requests
  • 1 RU = 64 KiB read request payload (prorated)

  • 1 RU = 1 storage write batch

  • 1 RU = 1 storage write request

  • 1 RU = 1 KiB write request payload (prorated)

Diagnose and optimize your resource consumption

Substantial RU consumption (greater than 100 RU/second) is usually caused by SQL queries issued by the application. This can be confirmed by verifying that RU consumption tightly follows changes to the application’s SQL QPS (queries per second). On the CockroachDB Cloud Console Overview metrics page, you can compare the Request Units chart to the SQL Statements chart. Assuming the charts correlate, then reducing Request Unit consumption is about optimizing application SQL queries (SELECT, UPDATE, INSERT, DELETE).

In the CockroachDB Cloud Console, you can monitor your cluster's SQL activity on the Statements and Transactions pages.

To see which queries are using the most resources, sort queries by the time they took to process, the number of rows processed, or the number of bytes read. If you have queries that return more data than needed or have long runtimes, those are good candidates for optimization.

Expensive queries

Expensive queries, especially FULL SCAN operations, are the most common cause of unexpected RU consumption increases. To diagnose expensive queries:

  1. Navigate to the Statements tab of your cluster's SQL Activity page in the Cloud Console.
  2. Click on the title of the Rows Processed column to sort your queries by the number of rows processed.

    For most queries, total rows processed should be no more than a few hundred. Read queries are often more expensive than write queries.

  3. Sort the queries by the Bytes Read column. Most queries should read no more than a few kilobytes per row.

If any queries are more expensive than expected, you can use the EXPLAIN ANALYZE SQL command for an estimate of the RUs consumed. Efficient queries generally consume fewer RUs, so the guidelines for optimizing query performance can be applied here. For more information, refer to How to troubleshoot and optimize query performance in CockroachDB.

Excessive queries

Each query has an associated RU cost, so the total number of queries is an important factor in your consumption.

To diagnose excessive queries, navigate to your cluster's Overview metrics page in the Cloud Console. The SQL Statements chart displays the number of queries over time. Look for any spikes or increases in QPS (queries per second) that may correspond to increases in the Request Units chart.

Reducing the rate of queries is application-specific and must be achieved at the application level.

To investigate potentially problematic queries - ones that are excessive and expensive:

  1. Navigate to the Statements tab of your cluster's SQL Activity page in the Cloud Console.
  2. Click on the title of the Execution Count column to sort your queries by the number of executions of the statements within the time interval.
  3. Sort the queries by the Rows Processed or Bytes Read column. For most queries, total rows processed should be no more than a few hundred. Most queries should read no more than a few kilobytes per row.

Excessive number of connections

CockroachDB Standard clusters consume minimal resources per connection, so increased RU consumption is not likely to be caused by a high number of connections. However, it will be important to manage your connections for both performance optimization and RU consumption as your application scales up.

Maintaining fewer than five active connections is recommended for most workloads. To diagnose excessive connections, navigate to your cluster's Overview metrics page in the Cloud Console. The SQL Connections chart displays new SQL connection attempts over time.

Connection pooling is the recommended way to manage the number of connections for many workloads. To read more about connection pooling, see What is Connection Pooling, and Why Should You Care.

Excessive data egress

To diagnose excessive data egress, navigate to your cluster's Request Units metrics page in the Cloud Console and monitor the Egress chart for high RU's for client traffic or bulk I/O operations.

Excessive egress can be treated similarly to expensive queries. Reducing the amount of data returned per query is often the best way to decrease egress data. You can also reduce the frequency of excessive queries.

Database UI tools

Database management tools like DBeaver also consume RUs. They can cause excessive RU consumption by running expensive queries to populate views and periodically refreshing in the background if left running.

To determine whether database UI tools are contributing to your RU usage, navigate to the SQL Activity page in the Cloud Console and search for queries similar to the following:

SELECT count(*) FROM crdb_internal.cluster_sessions

SELECT avg((((statistics->'')->'')->'_')::INT8) AS meanrunlatency
FROM crdb_internal.statement_statistics AS ciss
WHERE ciss.aggregated_ts::DATE = current_date()

You might also see multiple open connections to your cluster that persist regardless of workload. This can indicate that multiple team members have a database UI tool running.

Data migration

An initial data load during a migration may consume a high number of RUs. Generally in this case, optimized performance will also coincide with optimized RU consumption. For more information about migrations, refer to the Migration Overview.

Changefeeds (CDC)

Changefeeds can contribute to significant RU usage. To monitor changefeed performance, navigate to your cluster's Changefeeds metrics page in the Cloud Console and monitor the available charts.

Refer to Change Data Capture Queries for performance guidance that may decrease RU consumption.

General tips for reducing RU usage

RU consumption can be broken down to more granular components that can give additional optimization insight. Use the Cloud Console Request Units metrics charts to see what kind of resource usage is driving RU consumption.

The following recommendations can help minimize the RU cost of a query by reducing the work your cluster must do to execute that query. The tips are grouped according to the type of resource usage.

Reads

If the Reads chart shows high RU's for batches, requests, or bytes (payload):

  • Use secondary indexes that reduce the number of rows that need to be scanned.
  • Avoid “wide” columns in tables that are frequently scanned, such as large text or binary columns. Instead, offload them to a separate table that is only accessed when those columns are needed.
  • Don't disable automatic statistics, as they are needed to power the optimizer.
  • Take advantage of SQL filters, joins, and aggregations rather than performing these operations in the application to reduce the amount of data returned to the client.

Writes

If the Writes chart shows high RU's for batches, requests, or bytes (payload):

  • Drop indexes that are no longer needed since each additional index requires an additional write.
  • Use batched INSERT statements to insert multiple rows in a single statement, rather than sending a separate statement per row.
  • Use range UPDATE and DELETE statements to affect many rows in a single statement, rather than sending a separate statement per row.

Egress

If the Egress chart shows high RU's for client traffic or bulk I/O operations:

  • Avoid returning columns that your application does not need.
  • Take advantage of SQL filters, joins, and aggregations rather than performing these operations in the application to reduce the amount of data returned to the client.

Cross-region Networking

If the Cross-region Networking chart shows high RU's for network traffic:

SQL CPU

If the CPU chart shows high RU's for total amount of CPU used by SQL pods:

  • Most of the above tips also apply to reducing SQL CPU.
  • Ensure that frequently executed queries are “prepared” so they can be cached by the SQL layer. Most ORMs and drivers do this automatically, so it’s usually not a problem.

Example Request Unit calculation

Say you have a simple key-value pair table with a secondary index:

icon/buttons/copy
CREATE TABLE kv (k INT PRIMARY KEY, v STRING, INDEX (v));

Now you insert a row into the table:

icon/buttons/copy
INSERT INTO kv VALUES (1, '...imagine this is a 1 KiB string...');

The amount of SQL CPU needed to execute this query is about 1.5 milliseconds. The network egress is also minimal, around 50 bytes.

Most of the cost comes from 6 write requests to the storage layer with about 6 KiB in request payload (plus some extra overhead). The INSERT is first issued for the primary index on k, and then for the secondary index on v. Each of those writes is replicated 3 times to different storage locations, which is a total of 6 requests. All of these costs add up to a total number of RUs:

1.5 SQL CPU milliseconds = 0.5 RU

50 bytes network egress = 50/1024 = 0.05 RU

6 storage write batches = 6 RU

6 storage write requests = 6 RU

6 KiB write payloads = 6 RU

Total cost = 18.55 RU

Note that this is not exact, as there can be slight variations in multiple parts of the calculation.

You can use the EXPLAIN ANALYZE SQL command with your statements to estimate the RU usage of that statement. For example, prepend EXPLAIN ANALYZE to the INSERT statement:

icon/buttons/copy
EXPLAIN ANALYZE INSERT INTO kv VALUES (1, '...imagine this is a 1 KiB string...');
               info
-----------------------------------
  planning time: 13ms
  execution time: 6ms
  distribution: local
  vectorized: true
  maximum memory usage: 10 KiB
  network usage: 0 B (0 messages)
  estimated RUs consumed: 15

  • insert fast path
    nodes: n1
    actual row count: 1
    into: kv(k, v)
    auto commit
    size: 2 columns, 1 row
(14 rows)


Time: 71ms total (execution 20ms / network 50ms)

This will insert the data, and also output information from the optimizer about the execution of the statement. The estimated RUs consumed field represents the optimizer's estimate of RU consumption for the statement. In this case, the optimizer estimated the INSERT statement would consume 15 RUs, which is similar to the estimate of 18.5 RUs.

Learn more


Yes No
On this page

Yes No