Publication date: January 27, 2023
Description
In CockroachDB v22.2.0, v22.2.1, v22.2.2, and v22.2.3, new Prometheus histogram buckets were defined for metrics to reduce the overall number of timeseries metrics generated per Prometheus histogram to 15, from a previously dynamically generated number of buckets exceeding 100 buckets. We had received feedback from customers that prior versions of CockroachDB generated an unmanageable number of histogram buckets, which is what led to this change.
These new histogram buckets significantly reduced the fidelity of histograms when calculating quantiles (e.g. P90, P95, P99, etc). In particular, all histogram metrics have only 15 possible values, instead of the previous dynamic number of buckets, up to 100s prior to the change. Users may notice notable changes in latency reported when viewing latency graphs in DB Console, Prometheus, or other timeseries tools. This does not necessarily indicate an actual change in latencies, but rather is a byproduct of lower fidelity histogram bucket boundaries.
This causes a notable negative impact on observability for all histogram metrics used within CockroachDB, making it more difficult to discern if there’s an actual increase in latency when viewing a timeseries graph, or if the increase is a result of lower fidelity buckets.
Statement
This issue is resolved in the CockroachDB v22.2.5 maintenance release, by increasing the fidelity of the histogram buckets.
You can also use the new COCKROACH_ENABLE_HDR_HISTOGRAMS
environment variable to control which histogram model your deployment uses: setting COCKROACH_ENABLE_HDR_HISTOGRAMS=true
will use the legacy HdrHistogram model instead of the newer Prometheus-backed histogram model. Note that this is not recommended unless you are having difficulties with the newer Prometheus-backed histogram model. Enabling the legacy HdrHistogram model can cause performance issues with timeseries databases like Prometheus, as processing and storing the increased number of buckets is taxing on both CPU and storage. Note that the legacy HdrHistogram model is slated for full deprecation in upcoming releases. The COCKROACH_ENABLE_HDR_HISTOGRAMS
environment variable defaults to false
.
Mitigation
Users of CockroachDB v22.2.0, v22.2.1, v22.2.2, and v22.2.3 should make use of the Statement Details page within the DB Console to verify query latencies, as the latency reported on this page does not rely on histogram metrics. Users are also encouraged to more closely monitor client side latencies in the meantime. Finally, users are encouraged to update to CockroachDB v22.2.5 or later.
Impact
Histogram metrics in CockroachDB v22.2.0, v22.2.1, v22.2.2, and v22.2.3 instances may report higher than expected values, which alone should not be trusted as an indicator that overall latencies have changed.
Please reach out to the support team if more information or assistance is needed.