As part of normal operation, CockroachDB continuously records metrics that track performance, latency, usage, and many other runtime indicators. These metrics are often useful in diagnosing problems, troubleshooting performance, or planning cluster infrastructure modifications. This page documents locations where metrics are exposed for analysis.
Available metrics
CockroachDB Metric Name | Description | Type | Unit |
addsstable.applications |
Number of SSTable ingestions applied (i.e. applied by Replicas) | COUNTER | COUNT |
addsstable.copies |
number of SSTable ingestions that required copying files during application | COUNTER | COUNT |
addsstable.proposals |
Number of SSTable ingestions proposed (i.e. sent to Raft by lease holders) | COUNTER | COUNT |
admission.io.overload |
1-normalized float indicating whether IO admission control considers the store as overloaded with respect to compaction out of L0 (considers sub-level and file counts). | GAUGE | PERCENT |
auth.cert.conn.latency |
Latency to establish and authenticate a SQL connection using certificate | HISTOGRAM | NANOSECONDS |
auth.gss.conn.latency |
Latency to establish and authenticate a SQL connection using GSS | HISTOGRAM | NANOSECONDS |
auth.jwt.conn.latency |
Latency to establish and authenticate a SQL connection using JWT Token | HISTOGRAM | NANOSECONDS |
auth.ldap.conn.latency |
Latency to establish and authenticate a SQL connection using LDAP | HISTOGRAM | NANOSECONDS |
auth.password.conn.latency |
Latency to establish and authenticate a SQL connection using password | HISTOGRAM | NANOSECONDS |
auth.scram.conn.latency |
Latency to establish and authenticate a SQL connection using SCRAM | HISTOGRAM | NANOSECONDS |
build.timestamp |
Build information | GAUGE | TIMESTAMP_SEC |
capacity |
Total storage capacity | GAUGE | BYTES |
capacity.available |
Available storage capacity | GAUGE | BYTES |
capacity.reserved |
Capacity reserved for snapshots | GAUGE | BYTES |
capacity.used |
Used storage capacity | GAUGE | BYTES |
changefeed.aggregator_progress |
The earliest timestamp up to which any aggregator is guaranteed to have emitted all values for | GAUGE | TIMESTAMP_NS |
changefeed.backfill_count |
Number of changefeeds currently executing backfill | GAUGE | COUNT |
changefeed.backfill_pending_ranges |
Number of ranges in an ongoing backfill that are yet to be fully emitted | GAUGE | COUNT |
changefeed.checkpoint_progress |
The earliest timestamp of any changefeed's persisted checkpoint (values prior to this timestamp will never need to be re-emitted) | GAUGE | TIMESTAMP_NS |
changefeed.commit_latency |
Event commit latency: a difference between event MVCC timestamp and the time it was acknowledged by the downstream sink. If the sink batches events, then the difference between the oldest event in the batch and acknowledgement is recorded; Excludes latency during backfill | HISTOGRAM | NANOSECONDS |
changefeed.emitted_bytes |
Bytes emitted by all feeds | COUNTER | BYTES |
changefeed.emitted_messages |
Messages emitted by all feeds | COUNTER | COUNT |
changefeed.error_retries |
Total retryable errors encountered by all changefeeds | COUNTER | COUNT |
changefeed.failures |
Total number of changefeed jobs which have failed | COUNTER | COUNT |
changefeed.lagging_ranges |
The number of ranges considered to be lagging behind | GAUGE | COUNT |
changefeed.max_behind_nanos |
(Deprecated in favor of checkpoint_progress) The most any changefeed's persisted checkpoint is behind the present | GAUGE | NANOSECONDS |
changefeed.message_size_hist |
Message size histogram | HISTOGRAM | BYTES |
changefeed.running |
Number of currently running changefeeds, including sinkless | GAUGE | COUNT |
clock-offset.meannanos |
Mean clock offset with other nodes | GAUGE | NANOSECONDS |
clock-offset.stddevnanos |
Stddev clock offset with other nodes | GAUGE | NANOSECONDS |
cluster.preserve-downgrade-option.last-updated |
Unix timestamp of last updated time for cluster.preserve_downgrade_option | GAUGE | TIMESTAMP_SEC |
distsender.batches |
Number of batches processed | COUNTER | COUNT |
distsender.batches.partial |
Number of partial batches processed after being divided on range boundaries | COUNTER | COUNT |
distsender.errors.notleaseholder |
Number of NotLeaseHolderErrors encountered from replica-addressed RPCs | COUNTER | COUNT |
distsender.rpc.sent |
Number of replica-addressed RPCs sent | COUNTER | COUNT |
distsender.rpc.sent.local |
Number of replica-addressed RPCs sent through the local-server optimization | COUNTER | COUNT |
distsender.rpc.sent.nextreplicaerror |
Number of replica-addressed RPCs sent due to per-replica errors | COUNTER | COUNT |
exec.error |
Number of batch KV requests that failed to execute on this node.
This count excludes transaction restart/abort errors. However, it will include other errors expected during normal operation, such as ConditionFailedError. This metric is thus not an indicator of KV health. |
COUNTER | COUNT |
exec.latency |
Latency of batch KV requests (including errors) executed on this node.
This measures requests already addressed to a single replica, from the moment at which they arrive at the internal gRPC endpoint to the moment at which the response (or an error) is returned. This latency includes in particular commit waits, conflict resolution and replication, and end-users can easily produce high measurements via long-running transactions that conflict with foreground traffic. This metric thus does not provide a good signal for understanding the health of the KV layer. |
HISTOGRAM | NANOSECONDS |
exec.success |
Number of batch KV requests executed successfully on this node.
A request is considered to have executed 'successfully' if it either returns a result or a transaction restart/abort error. |
COUNTER | COUNT |
gcbytesage |
Cumulative age of non-live data | GAUGE | SECONDS |
gossip.bytes.received |
Number of received gossip bytes | COUNTER | BYTES |
gossip.bytes.sent |
Number of sent gossip bytes | COUNTER | BYTES |
gossip.connections.incoming |
Number of active incoming gossip connections | GAUGE | COUNT |
gossip.connections.outgoing |
Number of active outgoing gossip connections | GAUGE | COUNT |
gossip.connections.refused |
Number of refused incoming gossip connections | COUNTER | COUNT |
gossip.infos.received |
Number of received gossip Info objects | COUNTER | COUNT |
gossip.infos.sent |
Number of sent gossip Info objects | COUNTER | COUNT |
intentage |
Cumulative age of locks | GAUGE | SECONDS |
intentbytes |
Number of bytes in intent KV pairs | GAUGE | BYTES |
intentcount |
Count of intent keys | GAUGE | COUNT |
jobs.auto_config_env_runner.currently_paused |
Number of auto_config_env_runner jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_config_env_runner.protected_age_sec |
The age of the oldest PTS record protected by auto_config_env_runner jobs | GAUGE | SECONDS |
jobs.auto_config_env_runner.protected_record_count |
Number of protected timestamp records held by auto_config_env_runner jobs | GAUGE | COUNT |
jobs.auto_config_runner.currently_paused |
Number of auto_config_runner jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_config_runner.protected_age_sec |
The age of the oldest PTS record protected by auto_config_runner jobs | GAUGE | SECONDS |
jobs.auto_config_runner.protected_record_count |
Number of protected timestamp records held by auto_config_runner jobs | GAUGE | COUNT |
jobs.auto_config_task.currently_paused |
Number of auto_config_task jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_config_task.protected_age_sec |
The age of the oldest PTS record protected by auto_config_task jobs | GAUGE | SECONDS |
jobs.auto_config_task.protected_record_count |
Number of protected timestamp records held by auto_config_task jobs | GAUGE | COUNT |
jobs.auto_create_partial_stats.currently_paused |
Number of auto_create_partial_stats jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_create_partial_stats.protected_age_sec |
The age of the oldest PTS record protected by auto_create_partial_stats jobs | GAUGE | SECONDS |
jobs.auto_create_partial_stats.protected_record_count |
Number of protected timestamp records held by auto_create_partial_stats jobs | GAUGE | COUNT |
jobs.auto_create_stats.currently_paused |
Number of auto_create_stats jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_create_stats.currently_running |
Number of auto_create_stats jobs currently running in Resume or OnFailOrCancel state | GAUGE | COUNT |
jobs.auto_create_stats.protected_age_sec |
The age of the oldest PTS record protected by auto_create_stats jobs | GAUGE | SECONDS |
jobs.auto_create_stats.protected_record_count |
Number of protected timestamp records held by auto_create_stats jobs | GAUGE | COUNT |
jobs.auto_create_stats.resume_failed |
Number of auto_create_stats jobs which failed with a non-retriable error | COUNTER | COUNT |
jobs.auto_schema_telemetry.currently_paused |
Number of auto_schema_telemetry jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_schema_telemetry.protected_age_sec |
The age of the oldest PTS record protected by auto_schema_telemetry jobs | GAUGE | SECONDS |
jobs.auto_schema_telemetry.protected_record_count |
Number of protected timestamp records held by auto_schema_telemetry jobs | GAUGE | COUNT |
jobs.auto_span_config_reconciliation.currently_paused |
Number of auto_span_config_reconciliation jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_span_config_reconciliation.protected_age_sec |
The age of the oldest PTS record protected by auto_span_config_reconciliation jobs | GAUGE | SECONDS |
jobs.auto_span_config_reconciliation.protected_record_count |
Number of protected timestamp records held by auto_span_config_reconciliation jobs | GAUGE | COUNT |
jobs.auto_sql_stats_compaction.currently_paused |
Number of auto_sql_stats_compaction jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_sql_stats_compaction.protected_age_sec |
The age of the oldest PTS record protected by auto_sql_stats_compaction jobs | GAUGE | SECONDS |
jobs.auto_sql_stats_compaction.protected_record_count |
Number of protected timestamp records held by auto_sql_stats_compaction jobs | GAUGE | COUNT |
jobs.auto_update_sql_activity.currently_paused |
Number of auto_update_sql_activity jobs currently considered Paused | GAUGE | COUNT |
jobs.auto_update_sql_activity.protected_age_sec |
The age of the oldest PTS record protected by auto_update_sql_activity jobs | GAUGE | SECONDS |
jobs.auto_update_sql_activity.protected_record_count |
Number of protected timestamp records held by auto_update_sql_activity jobs | GAUGE | COUNT |
jobs.backup.currently_paused |
Number of backup jobs currently considered Paused | GAUGE | COUNT |
jobs.backup.currently_running |
Number of backup jobs currently running in Resume or OnFailOrCancel state | GAUGE | COUNT |
jobs.backup.protected_age_sec |
The age of the oldest PTS record protected by backup jobs | GAUGE | SECONDS |
jobs.backup.protected_record_count |
Number of protected timestamp records held by backup jobs | GAUGE | COUNT |
jobs.changefeed.currently_paused |
Number of changefeed jobs currently considered Paused | GAUGE | COUNT |
jobs.changefeed.expired_pts_records |
Number of expired protected timestamp records owned by changefeed jobs | COUNTER | COUNT |
jobs.changefeed.protected_age_sec |
The age of the oldest PTS record protected by changefeed jobs | GAUGE | SECONDS |
jobs.changefeed.protected_record_count |
Number of protected timestamp records held by changefeed jobs | GAUGE | COUNT |
jobs.changefeed.resume_retry_error |
Number of changefeed jobs which failed with a retriable error | COUNTER | COUNT |
jobs.create_stats.currently_paused |
Number of create_stats jobs currently considered Paused | GAUGE | COUNT |
jobs.create_stats.currently_running |
Number of create_stats jobs currently running in Resume or OnFailOrCancel state | GAUGE | COUNT |
jobs.create_stats.protected_age_sec |
The age of the oldest PTS record protected by create_stats jobs | GAUGE | SECONDS |
jobs.create_stats.protected_record_count |
Number of protected timestamp records held by create_stats jobs | GAUGE | COUNT |
jobs.history_retention.currently_paused |
Number of history_retention jobs currently considered Paused | GAUGE | COUNT |
jobs.history_retention.protected_age_sec |
The age of the oldest PTS record protected by history_retention jobs | GAUGE | SECONDS |
jobs.history_retention.protected_record_count |
Number of protected timestamp records held by history_retention jobs | GAUGE | COUNT |
jobs.import.currently_paused |
Number of import jobs currently considered Paused | GAUGE | COUNT |
jobs.import.protected_age_sec |
The age of the oldest PTS record protected by import jobs | GAUGE | SECONDS |
jobs.import.protected_record_count |
Number of protected timestamp records held by import jobs | GAUGE | COUNT |
jobs.import_rollback.currently_paused |
Number of import_rollback jobs currently considered Paused | GAUGE | COUNT |
jobs.import_rollback.protected_age_sec |
The age of the oldest PTS record protected by import_rollback jobs | GAUGE | SECONDS |
jobs.import_rollback.protected_record_count |
Number of protected timestamp records held by import_rollback jobs | GAUGE | COUNT |
jobs.key_visualizer.currently_paused |
Number of key_visualizer jobs currently considered Paused | GAUGE | COUNT |
jobs.key_visualizer.protected_age_sec |
The age of the oldest PTS record protected by key_visualizer jobs | GAUGE | SECONDS |
jobs.key_visualizer.protected_record_count |
Number of protected timestamp records held by key_visualizer jobs | GAUGE | COUNT |
jobs.logical_replication.currently_paused |
Number of logical_replication jobs currently considered Paused | GAUGE | COUNT |
jobs.logical_replication.protected_age_sec |
The age of the oldest PTS record protected by logical_replication jobs | GAUGE | SECONDS |
jobs.logical_replication.protected_record_count |
Number of protected timestamp records held by logical_replication jobs | GAUGE | COUNT |
jobs.migration.currently_paused |
Number of migration jobs currently considered Paused | GAUGE | COUNT |
jobs.migration.protected_age_sec |
The age of the oldest PTS record protected by migration jobs | GAUGE | SECONDS |
jobs.migration.protected_record_count |
Number of protected timestamp records held by migration jobs | GAUGE | COUNT |
jobs.mvcc_statistics_update.currently_paused |
Number of mvcc_statistics_update jobs currently considered Paused | GAUGE | COUNT |
jobs.mvcc_statistics_update.protected_age_sec |
The age of the oldest PTS record protected by mvcc_statistics_update jobs | GAUGE | SECONDS |
jobs.mvcc_statistics_update.protected_record_count |
Number of protected timestamp records held by mvcc_statistics_update jobs | GAUGE | COUNT |
jobs.new_schema_change.currently_paused |
Number of new_schema_change jobs currently considered Paused | GAUGE | COUNT |
jobs.new_schema_change.protected_age_sec |
The age of the oldest PTS record protected by new_schema_change jobs | GAUGE | SECONDS |
jobs.new_schema_change.protected_record_count |
Number of protected timestamp records held by new_schema_change jobs | GAUGE | COUNT |
jobs.poll_jobs_stats.currently_paused |
Number of poll_jobs_stats jobs currently considered Paused | GAUGE | COUNT |
jobs.poll_jobs_stats.protected_age_sec |
The age of the oldest PTS record protected by poll_jobs_stats jobs | GAUGE | SECONDS |
jobs.poll_jobs_stats.protected_record_count |
Number of protected timestamp records held by poll_jobs_stats jobs | GAUGE | COUNT |
jobs.replication_stream_ingestion.currently_paused |
Number of replication_stream_ingestion jobs currently considered Paused | GAUGE | COUNT |
jobs.replication_stream_ingestion.protected_age_sec |
The age of the oldest PTS record protected by replication_stream_ingestion jobs | GAUGE | SECONDS |
jobs.replication_stream_ingestion.protected_record_count |
Number of protected timestamp records held by replication_stream_ingestion jobs | GAUGE | COUNT |
jobs.replication_stream_producer.currently_paused |
Number of replication_stream_producer jobs currently considered Paused | GAUGE | COUNT |
jobs.replication_stream_producer.protected_age_sec |
The age of the oldest PTS record protected by replication_stream_producer jobs | GAUGE | SECONDS |
jobs.replication_stream_producer.protected_record_count |
Number of protected timestamp records held by replication_stream_producer jobs | GAUGE | COUNT |
jobs.restore.currently_paused |
Number of restore jobs currently considered Paused | GAUGE | COUNT |
jobs.restore.protected_age_sec |
The age of the oldest PTS record protected by restore jobs | GAUGE | SECONDS |
jobs.restore.protected_record_count |
Number of protected timestamp records held by restore jobs | GAUGE | COUNT |
jobs.row_level_ttl.currently_paused |
Number of row_level_ttl jobs currently considered Paused | GAUGE | COUNT |
jobs.row_level_ttl.currently_running |
Number of row_level_ttl jobs currently running in Resume or OnFailOrCancel state | GAUGE | COUNT |
jobs.row_level_ttl.delete_duration |
Duration for delete requests during row level TTL. | HISTOGRAM | NANOSECONDS |
jobs.row_level_ttl.num_active_spans |
Number of active spans the TTL job is deleting from. | GAUGE | COUNT |
jobs.row_level_ttl.protected_age_sec |
The age of the oldest PTS record protected by row_level_ttl jobs | GAUGE | SECONDS |
jobs.row_level_ttl.protected_record_count |
Number of protected timestamp records held by row_level_ttl jobs | GAUGE | COUNT |
jobs.row_level_ttl.resume_completed |
Number of row_level_ttl jobs which successfully resumed to completion | COUNTER | COUNT |
jobs.row_level_ttl.resume_failed |
Number of row_level_ttl jobs which failed with a non-retriable error | COUNTER | COUNT |
jobs.row_level_ttl.rows_deleted |
Number of rows deleted by the row level TTL job. | COUNTER | COUNT |
jobs.row_level_ttl.rows_selected |
Number of rows selected for deletion by the row level TTL job. | COUNTER | COUNT |
jobs.row_level_ttl.select_duration |
Duration for select requests during row level TTL. | HISTOGRAM | NANOSECONDS |
jobs.row_level_ttl.span_total_duration |
Duration for processing a span during row level TTL. | HISTOGRAM | NANOSECONDS |
jobs.row_level_ttl.total_expired_rows |
Approximate number of rows that have expired the TTL on the TTL table. | GAUGE | COUNT |
jobs.row_level_ttl.total_rows |
Approximate number of rows on the TTL table. | GAUGE | COUNT |
jobs.schema_change.currently_paused |
Number of schema_change jobs currently considered Paused | GAUGE | COUNT |
jobs.schema_change.protected_age_sec |
The age of the oldest PTS record protected by schema_change jobs | GAUGE | SECONDS |
jobs.schema_change.protected_record_count |
Number of protected timestamp records held by schema_change jobs | GAUGE | COUNT |
jobs.schema_change_gc.currently_paused |
Number of schema_change_gc jobs currently considered Paused | GAUGE | COUNT |
jobs.schema_change_gc.protected_age_sec |
The age of the oldest PTS record protected by schema_change_gc jobs | GAUGE | SECONDS |
jobs.schema_change_gc.protected_record_count |
Number of protected timestamp records held by schema_change_gc jobs | GAUGE | COUNT |
jobs.standby_read_ts_poller.currently_paused |
Number of standby_read_ts_poller jobs currently considered Paused | GAUGE | COUNT |
jobs.standby_read_ts_poller.protected_age_sec |
The age of the oldest PTS record protected by standby_read_ts_poller jobs | GAUGE | SECONDS |
jobs.standby_read_ts_poller.protected_record_count |
Number of protected timestamp records held by standby_read_ts_poller jobs | GAUGE | COUNT |
jobs.typedesc_schema_change.currently_paused |
Number of typedesc_schema_change jobs currently considered Paused | GAUGE | COUNT |
jobs.typedesc_schema_change.protected_age_sec |
The age of the oldest PTS record protected by typedesc_schema_change jobs | GAUGE | SECONDS |
jobs.typedesc_schema_change.protected_record_count |
Number of protected timestamp records held by typedesc_schema_change jobs | GAUGE | COUNT |
jobs.update_table_metadata_cache.currently_paused |
Number of update_table_metadata_cache jobs currently considered Paused | GAUGE | COUNT |
jobs.update_table_metadata_cache.protected_age_sec |
The age of the oldest PTS record protected by update_table_metadata_cache jobs | GAUGE | SECONDS |
jobs.update_table_metadata_cache.protected_record_count |
Number of protected timestamp records held by update_table_metadata_cache jobs | GAUGE | COUNT |
keybytes |
Number of bytes taken up by keys | GAUGE | BYTES |
keycount |
Count of all keys | GAUGE | COUNT |
leases.epoch |
Number of replica leaseholders using epoch-based leases | GAUGE | COUNT |
leases.error |
Number of failed lease requests | COUNTER | COUNT |
leases.expiration |
Number of replica leaseholders using expiration-based leases | GAUGE | COUNT |
leases.success |
Number of successful lease requests | COUNTER | COUNT |
leases.transfers.error |
Number of failed lease transfers | COUNTER | COUNT |
leases.transfers.success |
Number of successful lease transfers | COUNTER | COUNT |
livebytes |
Number of bytes of live data (keys plus values) | GAUGE | BYTES |
livecount |
Count of live keys | GAUGE | COUNT |
liveness.epochincrements |
Number of times this node has incremented its liveness epoch | COUNTER | COUNT |
liveness.heartbeatfailures |
Number of failed node liveness heartbeats from this node | COUNTER | COUNT |
liveness.heartbeatlatency |
Node liveness heartbeat latency | HISTOGRAM | NANOSECONDS |
liveness.heartbeatsuccesses |
Number of successful node liveness heartbeats from this node | COUNTER | COUNT |
liveness.livenodes |
Number of live nodes in the cluster (will be 0 if this node is not itself live) | GAUGE | COUNT |
node-id |
node ID with labels for advertised RPC and HTTP addresses | GAUGE | CONST |
physical_replication.logical_bytes |
Logical bytes (sum of keys + values) ingested by all replication jobs | COUNTER | BYTES |
physical_replication.replicated_time_seconds |
The replicated time of the physical replication stream in seconds since the unix epoch. | GAUGE | SECONDS |
queue.consistency.pending |
Number of pending replicas in the consistency checker queue | GAUGE | COUNT |
queue.consistency.process.failure |
Number of replicas which failed processing in the consistency checker queue | COUNTER | COUNT |
queue.consistency.process.success |
Number of replicas successfully processed by the consistency checker queue | COUNTER | COUNT |
queue.consistency.processingnanos |
Nanoseconds spent processing replicas in the consistency checker queue | COUNTER | NANOSECONDS |
queue.gc.info.abortspanconsidered |
Number of AbortSpan entries old enough to be considered for removal | COUNTER | COUNT |
queue.gc.info.abortspangcnum |
Number of AbortSpan entries fit for removal | COUNTER | COUNT |
queue.gc.info.abortspanscanned |
Number of transactions present in the AbortSpan scanned from the engine | COUNTER | COUNT |
queue.gc.info.clearrangefailed |
Number of failed ClearRange operations during GC | COUNTER | COUNT |
queue.gc.info.clearrangesuccess |
Number of successful ClearRange operations during GC | COUNTER | COUNT |
queue.gc.info.intentsconsidered |
Number of 'old' intents | COUNTER | COUNT |
queue.gc.info.intenttxns |
Number of associated distinct transactions | COUNTER | COUNT |
queue.gc.info.numkeysaffected |
Number of keys with GC'able data | COUNTER | COUNT |
queue.gc.info.pushtxn |
Number of attempted pushes | COUNTER | COUNT |
queue.gc.info.resolvesuccess |
Number of successful intent resolutions | COUNTER | COUNT |
queue.gc.info.resolvetotal |
Number of attempted intent resolutions | COUNTER | COUNT |
queue.gc.info.transactionspangcaborted |
Number of GC'able entries corresponding to aborted txns | COUNTER | COUNT |
queue.gc.info.transactionspangccommitted |
Number of GC'able entries corresponding to committed txns | COUNTER | COUNT |
queue.gc.info.transactionspangcpending |
Number of GC'able entries corresponding to pending txns | COUNTER | COUNT |
queue.gc.info.transactionspanscanned |
Number of entries in transaction spans scanned from the engine | COUNTER | COUNT |
queue.gc.pending |
Number of pending replicas in the MVCC GC queue | GAUGE | COUNT |
queue.gc.process.failure |
Number of replicas which failed processing in the MVCC GC queue | COUNTER | COUNT |
queue.gc.process.success |
Number of replicas successfully processed by the MVCC GC queue | COUNTER | COUNT |
queue.gc.processingnanos |
Nanoseconds spent processing replicas in the MVCC GC queue | COUNTER | NANOSECONDS |
queue.raftlog.pending |
Number of pending replicas in the Raft log queue | GAUGE | COUNT |
queue.raftlog.process.failure |
Number of replicas which failed processing in the Raft log queue | COUNTER | COUNT |
queue.raftlog.process.success |
Number of replicas successfully processed by the Raft log queue | COUNTER | COUNT |
queue.raftlog.processingnanos |
Nanoseconds spent processing replicas in the Raft log queue | COUNTER | NANOSECONDS |
queue.raftsnapshot.pending |
Number of pending replicas in the Raft repair queue | GAUGE | COUNT |
queue.raftsnapshot.process.failure |
Number of replicas which failed processing in the Raft repair queue | COUNTER | COUNT |
queue.raftsnapshot.process.success |
Number of replicas successfully processed by the Raft repair queue | COUNTER | COUNT |
queue.raftsnapshot.processingnanos |
Nanoseconds spent processing replicas in the Raft repair queue | COUNTER | NANOSECONDS |
queue.replicagc.pending |
Number of pending replicas in the replica GC queue | GAUGE | COUNT |
queue.replicagc.process.failure |
Number of replicas which failed processing in the replica GC queue | COUNTER | COUNT |
queue.replicagc.process.success |
Number of replicas successfully processed by the replica GC queue | COUNTER | COUNT |
queue.replicagc.processingnanos |
Nanoseconds spent processing replicas in the replica GC queue | COUNTER | NANOSECONDS |
queue.replicagc.removereplica |
Number of replica removals attempted by the replica GC queue | COUNTER | COUNT |
queue.replicate.addreplica |
Number of replica additions attempted by the replicate queue | COUNTER | COUNT |
queue.replicate.addreplica.error |
Number of failed replica additions processed by the replicate queue | COUNTER | COUNT |
queue.replicate.addreplica.success |
Number of successful replica additions processed by the replicate queue | COUNTER | COUNT |
queue.replicate.pending |
Number of pending replicas in the replicate queue | GAUGE | COUNT |
queue.replicate.process.failure |
Number of replicas which failed processing in the replicate queue | COUNTER | COUNT |
queue.replicate.process.success |
Number of replicas successfully processed by the replicate queue | COUNTER | COUNT |
queue.replicate.processingnanos |
Nanoseconds spent processing replicas in the replicate queue | COUNTER | NANOSECONDS |
queue.replicate.purgatory |
Number of replicas in the replicate queue's purgatory, awaiting allocation options | GAUGE | COUNT |
queue.replicate.rebalancereplica |
Number of replica rebalancer-initiated additions attempted by the replicate queue | COUNTER | COUNT |
queue.replicate.removedeadreplica |
Number of dead replica removals attempted by the replicate queue (typically in response to a node outage) | COUNTER | COUNT |
queue.replicate.removedeadreplica.error |
Number of failed dead replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.removedeadreplica.success |
Number of successful dead replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.removedecommissioningreplica.error |
Number of failed decommissioning replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.removedecommissioningreplica.success |
Number of successful decommissioning replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.removereplica |
Number of replica removals attempted by the replicate queue (typically in response to a rebalancer-initiated addition) | COUNTER | COUNT |
queue.replicate.removereplica.error |
Number of failed replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.removereplica.success |
Number of successful replica removals processed by the replicate queue | COUNTER | COUNT |
queue.replicate.replacedeadreplica.error |
Number of failed dead replica replacements processed by the replicate queue | COUNTER | COUNT |
queue.replicate.replacedeadreplica.success |
Number of successful dead replica replacements processed by the replicate queue | COUNTER | COUNT |
queue.replicate.replacedecommissioningreplica.error |
Number of failed decommissioning replica replacements processed by the replicate queue | COUNTER | COUNT |
queue.replicate.replacedecommissioningreplica.success |
Number of successful decommissioning replica replacements processed by the replicate queue | COUNTER | COUNT |
queue.replicate.transferlease |
Number of range lease transfers attempted by the replicate queue | COUNTER | COUNT |
queue.split.pending |
Number of pending replicas in the split queue | GAUGE | COUNT |
queue.split.process.failure |
Number of replicas which failed processing in the split queue | COUNTER | COUNT |
queue.split.process.success |
Number of replicas successfully processed by the split queue | COUNTER | COUNT |
queue.split.processingnanos |
Nanoseconds spent processing replicas in the split queue | COUNTER | NANOSECONDS |
queue.tsmaintenance.pending |
Number of pending replicas in the time series maintenance queue | GAUGE | COUNT |
queue.tsmaintenance.process.failure |
Number of replicas which failed processing in the time series maintenance queue | COUNTER | COUNT |
queue.tsmaintenance.process.success |
Number of replicas successfully processed by the time series maintenance queue | COUNTER | COUNT |
queue.tsmaintenance.processingnanos |
Nanoseconds spent processing replicas in the time series maintenance queue | COUNTER | NANOSECONDS |
raft.commandsapplied |
Number of Raft commands applied.
This measurement is taken on the Raft apply loops of all Replicas (leaders and followers alike), meaning that it does not measure the number of Raft commands proposed (in the hypothetical extreme case, all Replicas may apply all commands through snapshots, thus not increasing this metric at all). Instead, it is a proxy for how much work is being done advancing the Replica state machines on this node. |
COUNTER | COUNT |
raft.heartbeats.pending |
Number of pending heartbeats and responses waiting to be coalesced | GAUGE | COUNT |
raft.process.commandcommit.latency |
Latency histogram for applying a batch of Raft commands to the state machine.
This metric is misnamed: it measures the latency for applying a batch of committed Raft commands to a Replica state machine. This requires only non-durable I/O (except for replication configuration changes). Note that a "batch" in this context is really a sub-batch of the batch received for application during raft ready handling. The 'raft.process.applycommitted.latency' histogram is likely more suitable in most cases, as it measures the total latency across all sub-batches (i.e. the sum of commandcommit.latency for a complete batch). |
HISTOGRAM | NANOSECONDS |
raft.process.logcommit.latency |
Latency histogram for committing Raft log entries to stable storage
This measures the latency of durably committing a group of newly received Raft entries as well as the HardState entry to disk. This excludes any data processing, i.e. we measure purely the commit latency of the resulting Engine write. Homogeneous bands of p50-p99 latencies (in the presence of regular Raft traffic), make it likely that the storage layer is healthy. Spikes in the latency bands can either hint at the presence of large sets of Raft entries being received, or at performance issues at the storage layer. |
HISTOGRAM | NANOSECONDS |
raft.process.tickingnanos |
Nanoseconds spent in store.processRaft() processing replica.Tick() | COUNTER | NANOSECONDS |
raft.process.workingnanos |
Nanoseconds spent in store.processRaft() working.
This is the sum of the measurements passed to the raft.process.handleready.latency histogram. |
COUNTER | NANOSECONDS |
raft.rcvd.app |
Number of MsgApp messages received by this store | COUNTER | COUNT |
raft.rcvd.appresp |
Number of MsgAppResp messages received by this store | COUNTER | COUNT |
raft.rcvd.dropped |
Number of incoming Raft messages dropped (due to queue length or size) | COUNTER | COUNT |
raft.rcvd.heartbeat |
Number of (coalesced, if enabled) MsgHeartbeat messages received by this store | COUNTER | COUNT |
raft.rcvd.heartbeatresp |
Number of (coalesced, if enabled) MsgHeartbeatResp messages received by this store | COUNTER | COUNT |
raft.rcvd.prevote |
Number of MsgPreVote messages received by this store | COUNTER | COUNT |
raft.rcvd.prevoteresp |
Number of MsgPreVoteResp messages received by this store | COUNTER | COUNT |
raft.rcvd.prop |
Number of MsgProp messages received by this store | COUNTER | COUNT |
raft.rcvd.snap |
Number of MsgSnap messages received by this store | COUNTER | COUNT |
raft.rcvd.timeoutnow |
Number of MsgTimeoutNow messages received by this store | COUNTER | COUNT |
raft.rcvd.transferleader |
Number of MsgTransferLeader messages received by this store | COUNTER | COUNT |
raft.rcvd.vote |
Number of MsgVote messages received by this store | COUNTER | COUNT |
raft.rcvd.voteresp |
Number of MsgVoteResp messages received by this store | COUNTER | COUNT |
raft.ticks |
Number of Raft ticks queued | COUNTER | COUNT |
raftlog.behind |
Number of Raft log entries followers on other stores are behind.
This gauge provides a view of the aggregate number of log entries the Raft leaders on this node think the followers are behind. Since a raft leader may not always have a good estimate for this information for all of its followers, and since followers are expected to be behind (when they are not required as part of a quorum) and the aggregate thus scales like the count of such followers, it is difficult to meaningfully interpret this metric. |
GAUGE | COUNT |
raftlog.truncated |
Number of Raft log entries truncated | COUNTER | COUNT |
range.adds |
Number of range additions | COUNTER | COUNT |
range.merges |
Number of range merges | COUNTER | COUNT |
range.raftleadertransfers |
Number of raft leader transfers | COUNTER | COUNT |
range.removes |
Number of range removals | COUNTER | COUNT |
range.snapshots.generated |
Number of generated snapshots | COUNTER | COUNT |
range.snapshots.rcvd-bytes |
Number of snapshot bytes received | COUNTER | BYTES |
range.snapshots.rebalancing.rcvd-bytes |
Number of rebalancing snapshot bytes received | COUNTER | BYTES |
range.snapshots.rebalancing.sent-bytes |
Number of rebalancing snapshot bytes sent | COUNTER | BYTES |
range.snapshots.recovery.rcvd-bytes |
Number of raft recovery snapshot bytes received | COUNTER | BYTES |
range.snapshots.recovery.sent-bytes |
Number of raft recovery snapshot bytes sent | COUNTER | BYTES |
range.snapshots.recv-in-progress |
Number of non-empty snapshots being received | GAUGE | COUNT |
range.snapshots.recv-queue |
Number of snapshots queued to receive | GAUGE | COUNT |
range.snapshots.recv-total-in-progress |
Number of total snapshots being received | GAUGE | COUNT |
range.snapshots.send-in-progress |
Number of non-empty snapshots being sent | GAUGE | COUNT |
range.snapshots.send-queue |
Number of snapshots queued to send | GAUGE | COUNT |
range.snapshots.send-total-in-progress |
Number of total snapshots being sent | GAUGE | COUNT |
range.snapshots.sent-bytes |
Number of snapshot bytes sent | COUNTER | BYTES |
range.snapshots.unknown.rcvd-bytes |
Number of unknown snapshot bytes received | COUNTER | BYTES |
range.snapshots.unknown.sent-bytes |
Number of unknown snapshot bytes sent | COUNTER | BYTES |
range.splits |
Number of range splits | COUNTER | COUNT |
rangekeybytes |
Number of bytes taken up by range keys (e.g. MVCC range tombstones) | GAUGE | BYTES |
rangekeycount |
Count of all range keys (e.g. MVCC range tombstones) | GAUGE | COUNT |
ranges |
Number of ranges | GAUGE | COUNT |
ranges.overreplicated |
Number of ranges with more live replicas than the replication target | GAUGE | COUNT |
Number of ranges with fewer live replicas than needed for quorum | GAUGE | COUNT | |
ranges.underreplicated |
Number of ranges with fewer live replicas than the replication target | GAUGE | COUNT |
rangevalbytes |
Number of bytes taken up by range key values (e.g. MVCC range tombstones) | GAUGE | BYTES |
rangevalcount |
Count of all range key values (e.g. MVCC range tombstones) | GAUGE | COUNT |
rebalancing.queriespersecond |
Number of kv-level requests received per second by the store, considering the last 30 minutes, as used in rebalancing decisions. | GAUGE | COUNT |
rebalancing.readbytespersecond |
Number of bytes read recently per second, considering the last 30 minutes. | GAUGE | BYTES |
rebalancing.readspersecond |
Number of keys read recently per second, considering the last 30 minutes. | GAUGE | COUNT |
rebalancing.requestspersecond |
Number of requests received recently per second, considering the last 30 minutes. | GAUGE | COUNT |
rebalancing.writebytespersecond |
Number of bytes written recently per second, considering the last 30 minutes. | GAUGE | BYTES |
rebalancing.writespersecond |
Number of keys written (i.e. applied by raft) per second to the store, considering the last 30 minutes. | GAUGE | COUNT |
replicas |
Number of replicas | GAUGE | COUNT |
replicas.leaders |
Number of raft leaders | GAUGE | COUNT |
replicas.leaders_invalid_lease |
Number of replicas that are Raft leaders whose lease is invalid | GAUGE | COUNT |
replicas.leaders_not_leaseholders |
Number of replicas that are Raft leaders whose range lease is held by another store | GAUGE | COUNT |
replicas.leaseholders |
Number of lease holders | GAUGE | COUNT |
replicas.quiescent |
Number of quiesced replicas | GAUGE | COUNT |
replicas.reserved |
Number of replicas reserved for snapshots | GAUGE | COUNT |
requests.backpressure.split |
Number of backpressured writes waiting on a Range split.
A Range will backpressure (roughly) non-system traffic when the range is above the configured size until the range splits. When the rate of this metric is nonzero over extended periods of time, it should be investigated why splits are not occurring. |
GAUGE | COUNT |
requests.slow.distsender |
Number of range-bound RPCs currently stuck or retrying for a long time.
Note that this is not a good signal for KV health. The remote side of the RPCs tracked here may experience contention, so an end user can easily cause values for this metric to be emitted by leaving a transaction open for a long time and contending with it using a second transaction. |
GAUGE | COUNT |
requests.slow.lease |
Number of requests that have been stuck for a long time acquiring a lease.
This gauge registering a nonzero value usually indicates range or replica unavailability, and should be investigated. In the common case, we also expect to see 'requests.slow.raft' to register a nonzero value, indicating that the lease requests are not getting a timely response from the replication layer. |
GAUGE | COUNT |
requests.slow.raft |
Number of requests that have been stuck for a long time in the replication layer.
An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated. |
GAUGE | COUNT |
rocksdb.block.cache.hits |
Count of block cache hits | COUNTER | COUNT |
rocksdb.block.cache.misses |
Count of block cache misses | COUNTER | COUNT |
rocksdb.block.cache.usage |
Bytes used by the block cache | GAUGE | BYTES |
rocksdb.bloom.filter.prefix.checked |
Number of times the bloom filter was checked | COUNTER | COUNT |
rocksdb.bloom.filter.prefix.useful |
Number of times the bloom filter helped avoid iterator creation | COUNTER | COUNT |
rocksdb.compactions |
Number of table compactions | COUNTER | COUNT |
rocksdb.flushes |
Number of table flushes | COUNTER | COUNT |
rocksdb.memtable.total-size |
Current size of memtable in bytes | GAUGE | BYTES |
rocksdb.num-sstables |
Number of storage engine SSTables | GAUGE | COUNT |
rocksdb.read-amplification |
Number of disk reads per query | GAUGE | COUNT |
rocksdb.table-readers-mem-estimate |
Memory used by index and filter blocks | GAUGE | BYTES |
round-trip-latency |
Distribution of round-trip latencies with other nodes.
This only reflects successful heartbeats and measures gRPC overhead as well as possible head-of-line blocking. Elevated values in this metric may hint at network issues and/or saturation, but they are no proof of them. CPU overload can similarly elevate this metric. The operator should look towards OS-level metrics such as packet loss, retransmits, etc, to conclusively diagnose network issues. Heartbeats are not very frequent (~seconds), so they may not capture rare or short-lived degradations. |
HISTOGRAM | NANOSECONDS |
rpc.connection.avg_round_trip_latency |
Sum of exponentially weighted moving average of round-trip latencies, as measured through a gRPC RPC.
Dividing this Gauge by rpc.connection.healthy gives an approximation of average latency, but the top-level round-trip-latency histogram is more useful. Instead, users should consult the label families of this metric if they are available (which requires prometheus and the cluster setting 'server.child_metrics.enabled'); these provide per-peer moving averages. This metric does not track failed connection. A failed connection's contribution is reset to zero. |
GAUGE | NANOSECONDS |
rpc.connection.failures |
Counter of failed connections.
This includes both the event in which a healthy connection terminates as well as unsuccessful reconnection attempts. Connections that are terminated as part of local node shutdown are excluded. Decommissioned peers are excluded. |
COUNTER | COUNT |
rpc.connection.healthy |
Gauge of current connections in a healthy state (i.e. bidirectionally connected and heartbeating) | GAUGE | COUNT |
rpc.connection.healthy_nanos |
Gauge of nanoseconds of healthy connection time
On the prometheus endpoint scraped with the cluster setting 'server.child_metrics.enabled' set, the constituent parts of this metric are available on a per-peer basis and one can read off for how long a given peer has been connected |
GAUGE | NANOSECONDS |
rpc.connection.heartbeats |
Counter of successful heartbeats. | COUNTER | COUNT |
rpc.connection.unhealthy |
Gauge of current connections in an unhealthy state (not bidirectionally connected or heartbeating) | GAUGE | COUNT |
rpc.connection.unhealthy_nanos |
Gauge of nanoseconds of unhealthy connection time.
On the prometheus endpoint scraped with the cluster setting 'server.child_metrics.enabled' set, the constituent parts of this metric are available on a per-peer basis and one can read off for how long a given peer has been unreachable |
GAUGE | NANOSECONDS |
schedules.BACKUP.failed |
Number of BACKUP jobs failed | COUNTER | COUNT |
schedules.BACKUP.last-completed-time |
The unix timestamp of the most recently completed backup by a schedule specified as maintaining this metric | GAUGE | TIMESTAMP_SEC |
schedules.BACKUP.protected_age_sec |
The age of the oldest PTS record protected by BACKUP schedules | GAUGE | SECONDS |
schedules.BACKUP.protected_record_count |
Number of PTS records held by BACKUP schedules | GAUGE | COUNT |
schedules.BACKUP.started |
Number of BACKUP jobs started | COUNTER | COUNT |
schedules.BACKUP.succeeded |
Number of BACKUP jobs succeeded | COUNTER | COUNT |
schedules.scheduled-row-level-ttl-executor.failed |
Number of scheduled-row-level-ttl-executor jobs failed | COUNTER | COUNT |
security.certificate.expiration.ca |
Expiration for the CA certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.ca-client-tenant |
Expiration for the Tenant Client CA certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.client |
Minimum expiration for client certificates, labeled by SQL user. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.client-ca |
Expiration for the client CA certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.client-tenant |
Expiration for the Tenant Client certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.node |
Expiration for the node certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.node-client |
Expiration for the node's client certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.ui |
Expiration for the UI certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.expiration.ui-ca |
Expiration for the UI CA certificate. 0 means no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.ca |
Seconds till expiration for the CA certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.ca-client-tenant |
Seconds till expiration for the Tenant Client CA certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.client |
Seconds till expiration for the client certificates, labeled by SQL user. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.client-ca |
Seconds till expiration for the client CA certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.client-tenant |
Seconds till expiration for the Tenant Client certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.node |
Seconds till expiration for the node certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.node-client |
Seconds till expiration for the node's client certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.ui |
Seconds till expiration for the UI certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
security.certificate.ttl.ui-ca |
Seconds till expiration for the UI CA certificate. 0 means expired, no certificate or error. | GAUGE | TIMESTAMP_SEC |
sql.bytesin |
Number of SQL bytes received | COUNTER | BYTES |
sql.bytesout |
Number of SQL bytes sent | COUNTER | BYTES |
sql.conn.latency |
Latency to establish and authenticate a SQL connection | HISTOGRAM | NANOSECONDS |
sql.conns |
Number of open SQL connections | GAUGE | COUNT |
sql.crud_query.count |
Number of SQL SELECT, INSERT, UPDATE, DELETE statements successfully executed | COUNTER | COUNT |
sql.crud_query.started.count |
Number of SQL SELECT, INSERT, UPDATE, DELETE statements started | COUNTER | COUNT |
sql.ddl.count |
Number of SQL DDL statements successfully executed | COUNTER | COUNT |
sql.delete.count |
Number of SQL DELETE statements successfully executed | COUNTER | COUNT |
sql.distsql.contended_queries.count |
Number of SQL queries that experienced contention | COUNTER | COUNT |
sql.distsql.exec.latency |
Latency of DistSQL statement execution | HISTOGRAM | NANOSECONDS |
sql.distsql.flows.active |
Number of distributed SQL flows currently active | GAUGE | COUNT |
sql.distsql.flows.total |
Number of distributed SQL flows executed | COUNTER | COUNT |
sql.distsql.queries.active |
Number of SQL queries currently active | GAUGE | COUNT |
sql.distsql.queries.total |
Number of SQL queries executed | COUNTER | COUNT |
sql.distsql.select.count |
Number of DistSQL SELECT statements | COUNTER | COUNT |
sql.distsql.service.latency |
Latency of DistSQL request execution | HISTOGRAM | NANOSECONDS |
sql.exec.latency |
Latency of SQL statement execution | HISTOGRAM | NANOSECONDS |
sql.failure.count |
Number of statements resulting in a planning or runtime error | COUNTER | COUNT |
sql.full.scan.count |
Number of full table or index scans | COUNTER | COUNT |
sql.guardrails.max_row_size_err.count |
Number of rows observed violating sql.guardrails.max_row_size_err | COUNTER | COUNT |
sql.guardrails.max_row_size_log.count |
Number of rows observed violating sql.guardrails.max_row_size_log | COUNTER | COUNT |
sql.insert.count |
Number of SQL INSERT statements successfully executed | COUNTER | COUNT |
sql.mem.distsql.current |
Current sql statement memory usage for distsql | GAUGE | BYTES |
sql.mem.distsql.max |
Memory usage per sql statement for distsql | HISTOGRAM | BYTES |
sql.mem.internal.session.current |
Current sql session memory usage for internal | GAUGE | BYTES |
sql.mem.internal.session.max |
Memory usage per sql session for internal | HISTOGRAM | BYTES |
sql.mem.internal.txn.current |
Current sql transaction memory usage for internal | GAUGE | BYTES |
sql.mem.internal.txn.max |
Memory usage per sql transaction for internal | HISTOGRAM | BYTES |
sql.mem.root.current |
Current sql statement memory usage for root | GAUGE | BYTES |
sql.mem.root.max |
Memory usage per sql statement for root | HISTOGRAM | BYTES |
sql.misc.count |
Number of other SQL statements successfully executed | COUNTER | COUNT |
sql.new_conns |
Number of SQL connections created | COUNTER | COUNT |
sql.pgwire_cancel.ignored |
Number of pgwire query cancel requests that were ignored due to rate limiting | COUNTER | COUNT |
sql.pgwire_cancel.successful |
Number of pgwire query cancel requests that were successful | COUNTER | COUNT |
sql.pgwire_cancel.total |
Number of pgwire query cancel requests | COUNTER | COUNT |
sql.query.count |
Number of SQL operations started including queries, and transaction control statements | COUNTER | COUNT |
sql.select.count |
Number of SQL SELECT statements successfully executed | COUNTER | COUNT |
sql.service.latency |
Latency of SQL request execution | HISTOGRAM | NANOSECONDS |
sql.statements.active |
Number of currently active user SQL statements | GAUGE | COUNT |
sql.txn.abort.count |
Number of SQL transaction abort errors | COUNTER | COUNT |
sql.txn.begin.count |
Number of SQL transaction BEGIN statements successfully executed | COUNTER | COUNT |
sql.txn.commit.count |
Number of SQL transaction COMMIT statements successfully executed | COUNTER | COUNT |
sql.txn.contended.count |
Number of SQL transactions experienced contention | COUNTER | COUNT |
sql.txn.latency |
Latency of SQL transactions | HISTOGRAM | NANOSECONDS |
sql.txn.rollback.count |
Number of SQL transaction ROLLBACK statements successfully executed | COUNTER | COUNT |
sql.txns.open |
Number of currently open user SQL transactions | GAUGE | COUNT |
sql.update.count |
Number of SQL UPDATE statements successfully executed | COUNTER | COUNT |
storage.keys.range-key-set.count |
Approximate count of RangeKeySet internal keys across the storage engine. | GAUGE | COUNT |
storage.l0-level-score |
Compaction score of level 0 | GAUGE | COUNT |
storage.l0-level-size |
Size of the SSTables in level 0 | GAUGE | BYTES |
storage.l0-num-files |
Number of SSTables in Level 0 | GAUGE | COUNT |
storage.l0-sublevels |
Number of Level 0 sublevels | GAUGE | COUNT |
storage.l1-level-score |
Compaction score of level 1 | GAUGE | COUNT |
storage.l1-level-size |
Size of the SSTables in level 1 | GAUGE | BYTES |
storage.l2-level-score |
Compaction score of level 2 | GAUGE | COUNT |
storage.l2-level-size |
Size of the SSTables in level 2 | GAUGE | BYTES |
storage.l3-level-score |
Compaction score of level 3 | GAUGE | COUNT |
storage.l3-level-size |
Size of the SSTables in level 3 | GAUGE | BYTES |
storage.l4-level-score |
Compaction score of level 4 | GAUGE | COUNT |
storage.l4-level-size |
Size of the SSTables in level 4 | GAUGE | BYTES |
storage.l5-level-score |
Compaction score of level 5 | GAUGE | COUNT |
storage.l5-level-size |
Size of the SSTables in level 5 | GAUGE | BYTES |
storage.l6-level-score |
Compaction score of level 6 | GAUGE | COUNT |
storage.l6-level-size |
Size of the SSTables in level 6 | GAUGE | BYTES |
storage.marked-for-compaction-files |
Count of SSTables marked for compaction | GAUGE | COUNT |
storage.write-stalls |
Number of instances of intentional write stalls to backpressure incoming writes | GAUGE | COUNT |
sys.cgo.allocbytes |
Current bytes of memory allocated by cgo | GAUGE | BYTES |
sys.cgo.totalbytes |
Total bytes of memory allocated by cgo, but not released | GAUGE | BYTES |
sys.cgocalls |
Total number of cgo calls | COUNTER | COUNT |
sys.cpu.combined.percent-normalized |
Current user+system cpu percentage consumed by the CRDB process, normalized 0-1 by number of cores | GAUGE | PERCENT |
sys.cpu.host.combined.percent-normalized |
Current user+system cpu percentage across the whole machine, normalized 0-1 by number of cores | GAUGE | PERCENT |
sys.cpu.sys.ns |
Total system cpu time consumed by the CRDB process | COUNTER | NANOSECONDS |
sys.cpu.sys.percent |
Current system cpu percentage consumed by the CRDB process | GAUGE | PERCENT |
sys.cpu.user.ns |
Total user cpu time consumed by the CRDB process | COUNTER | NANOSECONDS |
sys.cpu.user.percent |
Current user cpu percentage consumed by the CRDB process | GAUGE | PERCENT |
sys.fd.open |
Process open file descriptors | GAUGE | COUNT |
sys.fd.softlimit |
Process open FD soft limit | GAUGE | COUNT |
sys.gc.count |
Total number of GC runs | COUNTER | COUNT |
sys.gc.pause.ns |
Total GC pause | COUNTER | NANOSECONDS |
sys.gc.pause.percent |
Current GC pause percentage | GAUGE | PERCENT |
sys.go.allocbytes |
Current bytes of memory allocated by go | GAUGE | BYTES |
sys.go.totalbytes |
Total bytes of memory allocated by go, but not released | GAUGE | BYTES |
sys.goroutines |
Current number of goroutines | GAUGE | COUNT |
sys.host.disk.iopsinprogress |
IO operations currently in progress on this host (as reported by the OS) | GAUGE | COUNT |
sys.host.disk.read.bytes |
Bytes read from all disks since this process started (as reported by the OS) | COUNTER | BYTES |
sys.host.disk.read.count |
Disk read operations across all disks since this process started (as reported by the OS) | COUNTER | COUNT |
sys.host.disk.write.bytes |
Bytes written to all disks since this process started (as reported by the OS) | COUNTER | BYTES |
sys.host.disk.write.count |
Disk write operations across all disks since this process started (as reported by the OS) | COUNTER | COUNT |
sys.host.net.recv.bytes |
Bytes received on all network interfaces since this process started (as reported by the OS) | COUNTER | BYTES |
sys.host.net.send.bytes |
Bytes sent on all network interfaces since this process started (as reported by the OS) | COUNTER | BYTES |
sys.rss |
Current process RSS | GAUGE | BYTES |
sys.runnable.goroutines.per.cpu |
Average number of goroutines that are waiting to run, normalized by number of cores | GAUGE | COUNT |
sys.totalmem |
Total memory (both free and used) | GAUGE | BYTES |
sys.uptime |
Process uptime | COUNTER | SECONDS |
sysbytes |
Number of bytes in system KV pairs | GAUGE | BYTES |
syscount |
Count of system KV pairs | GAUGE | COUNT |
tenant.consumption.cross_region_network_ru |
Total number of RUs charged for cross-region network traffic | COUNTER | COUNT |
tenant.consumption.external_io_egress_bytes |
Total number of bytes written to external services such as cloud storage providers | GAUGE | COUNT |
tenant.consumption.pgwire_egress_bytes |
Total number of bytes transferred from a SQL pod to the client | GAUGE | COUNT |
tenant.consumption.read_batches |
Total number of KV read batches | GAUGE | COUNT |
tenant.consumption.read_bytes |
Total number of bytes read from KV | GAUGE | COUNT |
tenant.consumption.read_requests |
Total number of KV read requests | GAUGE | COUNT |
tenant.consumption.request_units |
Total RU consumption | COUNTER | COUNT |
tenant.consumption.sql_pods_cpu_seconds |
Total amount of CPU used by SQL pods | GAUGE | SECONDS |
tenant.consumption.write_batches |
Total number of KV write batches | GAUGE | COUNT |
tenant.consumption.write_bytes |
Total number of bytes written to KV | GAUGE | COUNT |
tenant.consumption.write_requests |
Total number of KV write requests | GAUGE | COUNT |
tenant.sql_usage.cross_region_network_ru |
Total number of RUs charged for cross-region network traffic | COUNTER | COUNT |
tenant.sql_usage.estimated_cpu_seconds |
Estimated amount of CPU consumed by a virtual cluster | COUNTER | SECONDS |
tenant.sql_usage.external_io_egress_bytes |
Total number of bytes written to external services such as cloud storage providers | COUNTER | COUNT |
tenant.sql_usage.external_io_ingress_bytes |
Total number of bytes read from external services such as cloud storage providers | COUNTER | COUNT |
tenant.sql_usage.kv_request_units |
RU consumption attributable to KV | COUNTER | COUNT |
tenant.sql_usage.pgwire_egress_bytes |
Total number of bytes transferred from a SQL pod to the client | COUNTER | COUNT |
tenant.sql_usage.provisioned_vcpus |
Number of vcpus available to the virtual cluster | GAUGE | COUNT |
tenant.sql_usage.read_batches |
Total number of KV read batches | COUNTER | COUNT |
tenant.sql_usage.read_bytes |
Total number of bytes read from KV | COUNTER | COUNT |
tenant.sql_usage.read_requests |
Total number of KV read requests | COUNTER | COUNT |
tenant.sql_usage.request_units |
RU consumption | COUNTER | COUNT |
tenant.sql_usage.sql_pods_cpu_seconds |
Total amount of CPU used by SQL pods | COUNTER | SECONDS |
tenant.sql_usage.write_batches |
Total number of KV write batches | COUNTER | COUNT |
tenant.sql_usage.write_bytes |
Total number of bytes written to KV | COUNTER | COUNT |
tenant.sql_usage.write_requests |
Total number of KV write requests | COUNTER | COUNT |
timeseries.write.bytes |
Total size in bytes of metric samples written to disk | COUNTER | BYTES |
timeseries.write.errors |
Total errors encountered while attempting to write metrics to disk | COUNTER | COUNT |
timeseries.write.samples |
Total number of metric samples written to disk | COUNTER | COUNT |
totalbytes |
Total number of bytes taken up by keys and values including non-live data | GAUGE | BYTES |
txn.aborts |
Number of aborted KV transactions | COUNTER | COUNT |
txn.commits |
Number of committed KV transactions (including 1PC) | COUNTER | COUNT |
txn.commits1PC |
Number of KV transaction one-phase commits | COUNTER | COUNT |
txn.durations |
KV transaction durations | HISTOGRAM | NANOSECONDS |
txn.restarts |
Number of restarted KV transactions | HISTOGRAM | COUNT |
txn.restarts.asyncwritefailure |
Number of restarts due to async consensus writes that failed to leave intents | COUNTER | COUNT |
txn.restarts.readwithinuncertainty |
Number of restarts due to reading a new value within the uncertainty interval | COUNTER | COUNT |
txn.restarts.serializable |
Number of restarts due to a forwarded commit timestamp and isolation=SERIALIZABLE | COUNTER | COUNT |
txn.restarts.txnaborted |
Number of restarts due to an abort by a concurrent transaction (usually due to deadlock) | COUNTER | COUNT |
txn.restarts.txnpush |
Number of restarts due to a transaction push failure | COUNTER | COUNT |
txn.restarts.unknown |
Number of restarts due to a unknown reasons | COUNTER | COUNT |
txn.restarts.writetooold |
Number of restarts due to a concurrent writer committing first | COUNTER | COUNT |
txnwaitqueue.deadlocks_total |
Number of deadlocks detected by the txn wait queue | COUNTER | COUNT |
valbytes |
Number of bytes taken up by values | GAUGE | BYTES |
valcount |
Count of all values | GAUGE | COUNT |