Logical Data Replication Dashboard

On this page

The Logical Data Replication dashboard in the DB Console lets you monitor metrics related to the logical data replication (LDR) jobs on the destination cluster. These metrics are at the cluster level. This means that if there are multiple LDR jobs running on a cluster the DB Console will show the average metrics across jobs.

To view this dashboard, access the DB Console for the destination cluster, click Metrics on the left-hand navigation bar, and select Logical Data Replication from the Dashboard dropdown.

Note:

The Logical Data Replication dashboard is distinct from the Replication dashboard, which tracks metrics related to how data is replicated across the cluster, e.g., range status, replicas per store, etc.

Dashboard navigation

Use the Graph menu to display metrics for your entire cluster or for a specific node.

To the right of the Graph and Dashboard menus, a time interval selector allows you to filter the view for a predefined or custom time interval. Use the navigation buttons to move to the previous, next, or current time interval. When you select a time interval, the same interval is selected in the SQL Activity pages. However, if you select 10 or 30 minutes, the interval defaults to 1 hour in SQL Activity pages.

Hovering your mouse pointer over the graph title will display a tooltip with a description and the metrics used to create the graph.

When hovering on graphs, crosshair lines will appear at your mouse pointer. The series' values corresponding to the given time in the cross hairs are displayed in the legend under the graph. Hovering the mouse pointer on a given series displays the corresponding value near the mouse pointer and highlights the series line (graying out other series lines). Click anywhere within the graph to freeze the values in place. Click anywhere within the graph again to cause the values to change with your mouse movements once more.

In the legend, click on an individual series to isolate it on the graph. The other series will be hidden, while the hover will still work. Click the individual series again to make the other series visible. If there are many series, a scrollbar may appear on the right of the legend. This is to limit the size of the legend so that it does not get endlessly large, particularly on clusters with many nodes.

The Logical Data Replication dashboard displays the following time-series graphs:

Note:

The specific node views do not apply for LDR metrics.

Replication Latency

The graph shows the difference in commit times between the source cluster and the destination cluster.

Metric	CockroachDB Metric Name	Description
p50, p99	`logical_replication.commit_latency-p50`, `logical_replication.commit_latency-p99`	Event commit latency: a difference between event MVCC timestamp and the time it was flushed into disk. If we batch events, then the difference between the oldest event in the batch and flush is recorded.

Replication Lag

The graph shows the age of the oldest row on the source cluster that has yet to replicate to the destination cluster.

Metric	CockroachDB Metric Name	Description
Replication Lag	`logical_replication.replicated_time_seconds`	The replicated time of the logical replication stream in seconds since the unix epoch.

Row Updates Applied

The graph shows the rate at which row updates are applied by all logical replication jobs.

Metric	CockroachDB Metric Name	Description
Row Updates Applied	`logical_replication.events_ingested`	Events ingested by all logical replication jobs.
Row Updates sent to DLQ	`logical_replication.events_dlqed`	Row update events sent to the dead letter queue (DLQ).

Logical Bytes Received

In the cluster view, the graph shows the rate at which the logical bytes (sum of keys + values) are received by all logical replication jobs across all nodes.
In the node view, the graph shows the rate at which the logical bytes (sum of keys + values) are received by all logical replication jobs on the node.

Metric	CockroachDB Metric Name	Description
logical bytes	`logical_replication.logical_bytes`	Logical bytes (sum of keys + values) received by all replication jobs.

Batch Application Processing Time: 50th percentile

In the cluster view, the graph shows the 50th percentile in the time it takes to write a batch of row updates across all nodes.
In the node view, the graph shows the 50th percentile in the time it takes to write a batch of row updates on the node.

Metric	CockroachDB Metric Name	Description
processing time	`logical_replication.batch_hist_nanos-p50`	Time spent flushing a batch.

Batch Application Processing Time: 99th percentile

In the cluster view, the graph shows the 99th percentile in the time it takes to write a batch of row updates across all nodes.
In the node view, the graph shows the 99th percentile in the time it takes to write a batch of row updates on the node.

Metric	CockroachDB Metric Name	Description
processing time	`logical_replication.batch_hist_nanos-p99`	Time spent flushing a batch.

DLQ Causes

The graph shows the reasons why events were sent to the dead letter queue (DLQ)

Metric	CockroachDB Metric Name	Description
Retry Duration Expired	`logical_replication.events_dlqed_age`	Row update events sent to DLQ due to reaching the maximum time allowed in the retry queue.
Retry Queue Full	`logical_replication.events_dlqed_space`	Row update events sent to DLQ due to capacity of the retry queue.
Non-retryable	`logical_replication.events_dlqed_errtype`	Row update events sent to DLQ due to an error not considered retryable.

Retry Queue Size

In the cluster view, the graph shows the total size of the retry queues across all processors in all LDR jobs across all nodes.
In the node view, the graph shows the total size of the retry queues across all processors in all LDR jobs on the node.

Metric	CockroachDB Metric Name	Description
retry queue bytes	`logical_replication.retry_queue_bytes`	The size of the retry queue of the logical replication stream.

Summary and events

Summary panel

A Summary panel of key metrics is displayed to the right of the timeseries graphs.

Metric	Description
Total Nodes	The total number of nodes in the cluster. Decommissioned nodes are not included in this count.
Capacity Used	The storage capacity used as a percentage of usable capacity allocated across all nodes.
Unavailable Ranges	The number of unavailable ranges in the cluster. A non-zero number indicates an unstable cluster.
Queries per second	The total number of `SELECT`, `UPDATE`, `INSERT`, and `DELETE` queries executed per second across the cluster.
P99 Latency	The 99th percentile of service latency.

Note:

If you are testing your deployment locally with multiple CockroachDB nodes running on a single machine (this is not recommended in production), you must explicitly set the store size per node in order to display the correct capacity. Otherwise, the machine's actual disk capacity will be counted as a separate store for each node, thus inflating the computed capacity.

Events panel

Underneath the Summary panel, the Events panel lists the 5 most recent events logged for all nodes across the cluster. To list all events, click View all events.

DB Console Events

The following types of events are listed:

Database created
Database dropped
Table created
Table dropped
Table altered
Index created
Index dropped
View created
View dropped
Schema change reversed
Schema change finished
Node joined
Node decommissioned
Node restarted
Cluster setting changed

Pricing

Contact us

Sign In

Logical Data Replication Dashboard

Dashboard navigation

Replication Latency

Replication Lag

Row Updates Applied

Logical Bytes Received

Batch Application Processing Time: 50th percentile

Batch Application Processing Time: 99th percentile

DLQ Causes

Retry Queue Size

Summary and events

Summary panel

Events panel

See also

Tell us about your experience

Thank you for your feedback!

Explore More Documentation:

Logical Data Replication Dashboard

Dashboard navigation

Replication Latency

Replication Lag

Row Updates Applied

Logical Bytes Received

Batch Application Processing Time: 50th percentile

Batch Application Processing Time: 99th percentile

DLQ Causes

Retry Queue Size

Summary and events

Summary panel

Events panel

See also

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation: