Upgrade to CockroachDB v23.1

On this page Carat arrow pointing down

Because of CockroachDB's multi-active availability design, you can perform a "rolling upgrade" of your CockroachDB cluster. This means that you can upgrade nodes one at a time without interrupting the cluster's overall health and operations.

This page describes how to upgrade to the latest v23.1 release, v23.1.30. To upgrade CockroachDB on Kubernetes, refer to single-cluster or multi-cluster instead.

Terminology

Before upgrading, review the CockroachDB release terminology:

  • A new major release is performed multiple times per year. The major version number indicates the year of release followed by the release number, starting with 1. For example, the latest major release is v24.3.
  • Each supported major release is maintained across patch releases that contain improvements including performance or security enhancements and bug fixes. Each patch release increments the major version number with its corresponding patch number. For example, patch releases of v24.3 use the format v24.3.x.
  • All major and patch releases are suitable for production environments, and are therefore considered "production releases". For example, the latest production release is v24.3.2.
  • Prior to an upcoming major release, alpha, beta, and release candidate (RC) binaries are made available for users who need early access to a feature before it is available in a production release. These releases append the terms alpha, beta, or rc to the version number. These "testing releases" are not suitable for production environments and are not eligible for support or uptime SLA commitments. For more information, refer to the Release Support Policy.
Note:

There are no "minor releases" of CockroachDB.

Step 1. Verify that you can upgrade

Warning:

In CockroachDB v22.2.x and above, a cluster that is upgraded to an alpha binary of CockroachDB or a binary that was manually built from the master branch cannot subsequently be upgraded to a production release.

Run cockroach sql against any node in the cluster to open the SQL shell. Then check your current cluster version:

icon/buttons/copy
> SHOW CLUSTER SETTING version;

To upgrade to v23.1.30, you must be running either:

  • Any earlier v23.1 release: v23.1.0-alpha.1 to v23.1.29.

  • A v22.2 production release: v22.2.0 to v22.2.19.

If you are running any other version, take the following steps before continuing on this page:

Version Action(s) before upgrading to any v23.1 release
Pre-v23.1 testing release Upgrade to a corresponding production release; then upgrade through each subsequent major release, ending with a v22.2 production release.
Pre-v22.2 production release Upgrade through each subsequent major release, ending with a v22.2 production release.
v22.2 testing release Upgrade to a v22.2 production release.

When you are ready to upgrade to v23.1.30, continue to step 2.

Step 2. Prepare to upgrade

Before starting the upgrade, complete the following steps.

Ensure you have a valid license key

To perform major version upgrades, you must have a valid license key.

Patch version upgrades can be performed without a valid license key, with the following limitations:

  • The cluster will run without limitations for a specified grace period. During that time, alerts are displayed that the cluster needs a valid license key. For more information, refer to the Licensing FAQs.
  • The cluster is throttled at the end of the grace period if no valid license key is added to the cluster before then.

If you have an Enterprise Free or Enterprise Trial license, you must enable telemetry using the diagnostics.reporting.enabled cluster setting, as shown below in order to finalize a major version upgrade:

icon/buttons/copy
SET CLUSTER SETTING diagnostics.reporting.enabled = true;

If a cluster with an Enterprise Free or Enterprise Trial license is upgraded across patch versions and does not meet telemetry requirements:

  • The cluster will run without limitations for a 7-day grace period. During that time, alerts are displayed that the cluster needs to send telemetry.
  • The cluster is throttled if telemetry is not received before the end of the grace period.

For more information, refer to the Licensing FAQs.

If you want to stay on the previous version, you can roll back the upgrade before finalization.

Review breaking changes

Review the backward-incompatible changes, deprecated features, and key cluster setting changes in v23.1. If any affect your deployment, make the necessary changes before starting the rolling upgrade to v23.1.

Check load balancing

Make sure your cluster is behind a load balancer, or your clients are configured to talk to multiple nodes. If your application communicates with a single node, stopping that node to upgrade its CockroachDB binary will cause your application to fail.

Check cluster health

Verify the overall health of your cluster using the DB Console:

  • Under Node Status, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as SUSPECT or DEAD, identify why the nodes are offline and either restart them or decommission them before beginning your upgrade. If there are DEAD and non-decommissioned nodes in your cluster, it will not be possible to finalize the upgrade (either automatically or manually).

  • Under Replication Status, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to identify and resolve the cause of range under-replication and/or unavailability before beginning your upgrade.

  • In the Node List:

    • Make sure all nodes are on the same version. If any nodes are behind, upgrade them to the cluster's current version first, and then start this process over.
  • In the Metrics dashboards:

    • Make sure CPU, memory, and storage capacity are within acceptable values for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. If any of these metrics is above healthy limits, consider adding nodes to your cluster before beginning your upgrade.

Check decommissioned nodes

If your cluster contains partially-decommissioned nodes, they will block an upgrade attempt.

  1. To check the status of decommissioned nodes, run the cockroach node status --decommission command:

    icon/buttons/copy
    cockroach node status --decommission
    

    In the output, verify that the value of the membership field of each node is decommissioned. If any node's membership value is decommissioning, that node is not fully decommissioned.

  2. If any node is not fully decommissioned, try the following:

    1. First, reissue the decommission command. The second command typically succeeds within a few minutes.
    2. If the second decommission command does not succeed, recommission and then decommission it again. Before continuing the upgrade, the node must be marked as decommissioned.

Back up cluster

Because CockroachDB is designed with high fault tolerance, backups are primarily needed for disaster recovery. However, taking regular backups of your data is an operational best practice. When upgrading to a major release, we recommend taking a backup of your cluster. See our support policy for restoring backups across versions.

Pause changefeed jobs

Pause running changefeed jobs before starting the rolling upgrade process:

icon/buttons/copy
PAUSE JOB {changefeed_job_ID};

Or, pause all changefeed jobs:

icon/buttons/copy
PAUSE ALL CHANGEFEED JOBS;
Warning:

During a rolling upgrade, running changefeed jobs can slow down as more of the nodes move to the later version of CockroachDB. We recommend pausing changefeed jobs before upgrading and resuming changefeeds once the upgrade is finalized.

For more details on the potential impacts of node restarts, refer to the Changefeed Messages page.

Reset SQL statistics

Before upgrading to CockroachDB v23.1, it is recommended to reset the cluster's SQL statistics. Otherwise, it may take longer for the upgrade to complete on a cluster with large statement or transaction statistics tables. This is due to the addition of a new column and a new index to these tables. To reset SQL statistics, issue the following SQL command:

icon/buttons/copy
SELECT crdb_internal.reset_sql_stats();

Step 3. Decide how the upgrade will be finalized

Note:

This step is relevant only when upgrading from v22.2.x to v23.1. For upgrades within the v23.1.x series, skip this step.

By default, after all nodes are running the new version, the upgrade process will be auto-finalized. This will enable certain features and performance improvements introduced in v23.1. However, it will no longer be possible to roll back to v22.2 if auto-finalization is enabled. In the event of a catastrophic failure or corruption, the only option will be to start a new cluster using the previous binary and then restore from one of the backups created prior to performing the upgrade. For this reason, we recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade, but note that you will need to follow all of the subsequent directions, including the manual finalization in step 6:

  1. Upgrade to v22.2, if you haven't already.

  2. Start the cockroach sql shell against any node in the cluster.

  3. Set the cluster.preserve_downgrade_option cluster setting:

    icon/buttons/copy
    SET CLUSTER SETTING cluster.preserve_downgrade_option = '22.2';
    

    It is only possible to set this setting to the current cluster version.

Features that require upgrade finalization

When upgrading from v22.2 to v23.1, certain features and performance improvements will be enabled only after finalizing the upgrade, including but not limited to:

For an expanded list of features included in the v23.1 release, see the v23.1 release notes.

Step 4. Perform the rolling upgrade

Tip:

Cockroach Labs recommends creating scripts to perform these steps instead of performing them manually.

Follow these steps to perform the rolling upgrade. To upgrade CockroachDB on Kubernetes, refer to single-cluster or multi-cluster instead.

For each node in your cluster, complete the following steps. Be sure to upgrade only one node at a time, and wait at least one minute after a node rejoins the cluster to upgrade the next node. Simultaneously upgrading more than one node increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability.

Warning:

After beginning a major-version upgrade, Cockroach Labs recommends upgrading all nodes as quickly as possible. In a cluster with nodes running different major versions of CockroachDB, a query that is sent to an upgraded node can be distributed only among other upgraded nodes. Data accesses that would otherwise be local may become remote, and the performance of these queries can suffer.

These steps perform an upgrade to the latest v23.1 release, v23.1.30.

  1. Drain and shut down the node.

  2. Visit What's New in v23.1? and download the CockroachDB v23.1.30 full binary for your architecture.

  3. Extract the archive. In the following instructions, replace {COCKROACHDB_DIR} with the path to the extracted archive directory.

  4. If you have a previous version of the cockroach binary in your $PATH, rename the outdated cockroach binary, and then move the new one into its place.

    If you get a permission error because the cockroach binary is located in a system directory, add sudo before each command. The binary will be owned by the effective user, which is root if you use sudo.

    icon/buttons/copy
    i="$(which cockroach)"; mv "$i" "$i"_old
    
    icon/buttons/copy
    cp -i {COCKROACHDB_DIR}/cockroach /usr/local/bin/cockroach
    
  5. If a cluster has corrupt descriptors, a major-version upgrade cannot be finalized. In CockroachDB v23.1.6 and subsequent 23.1 versions, automatic descriptor repair is enabled and cannot be disabled. In CockroachDB v23.1.11 and above, automatic descriptor repair is available but disabled by default. To enable it in these versions, set the environment variable COCKROACH_RUN_FIRST_UPGRADE_PRECONDITION to true after installing the v23.1 binary but before restarting the cockroach process on the node. Monitor the cluster logs for errors. If a descriptor cannot be repaired automatically, contact support for assistance completing the upgrade.

  6. Start the node so that it can rejoin the cluster.

    Without a process manager like systemd, re-run the cockroach start command that you used to start the node initially, for example:

    icon/buttons/copy

    cockroach start \
        --certs-dir=certs \
        --advertise-addr={node address} \
        --join={node1 address},{node2 address},{node3 address}
    

    If you are using systemd as the process manager, run this command to start the node:

    icon/buttons/copy
    systemctl start {systemd config filename}
    

    Re-run the cockroach start command that you used to start the node initially, for example:

    icon/buttons/copy

    cockroach start \
        --certs-dir=certs \
        --advertise-addr={node address} \
        --join={node1 address},{node2 address},{node3 address}
    

  7. Verify the node has rejoined the cluster through its output to stdout or through the DB Console.

  8. If you use cockroach in your $PATH, you can remove the previous binary:

    icon/buttons/copy
    rm /usr/local/bin/cockroach_old
    

    If you leave versioned binaries on your servers, you do not need to do anything.

  9. After the node has rejoined the cluster, ensure that the node is ready to accept a SQL connection.

    Unless there are tens of thousands of ranges on the node, it's usually sufficient to wait one minute. To be certain that the node is ready, run the following command:

    icon/buttons/copy
    cockroach sql -e 'select 1'
    

    The command will automatically wait to complete until the node is ready.

  10. Repeat these steps for the next node.

Step 5. Roll back the upgrade (optional)

If you decide to roll back to v22.2, you must do so before the upgrade has been finalized, as described in the next section. It is always possible to roll back to a previous v23.1 version.

To roll back an upgrade, do the following on each cluster node:

  1. Perform a rolling upgrade, as described in the previous section, but replace the upgraded cockroach binary on each node with the binary for the previous version.
  2. Restart the cockroach process on the node and verify that it has rejoined the cluster before rolling back the upgrade on the next node.
  3. After all nodes have been rolled back and rejoined the cluster, finalize the rollback in the same way as you would finalize an upgrade, as described in the next section.

Step 6. Finish the upgrade

Because a finalized major-version upgrade cannot be rolled back, Cockroach Labs recommends that you monitor the stability and performance of your cluster with the upgraded binary for at least a day before deciding to finalize the upgrade.

Note:

Finalization is required only when upgrading from v22.2.x to v23.1. For upgrades within the v23.1.x series, skip this step.

  1. If you disabled auto-finalization in step 3, monitor the stability and performance of your cluster for at least a day. If you decide to roll back the upgrade, repeat the rolling restart procedure with the previous binary. Otherwise, perform the following steps to re-enable upgrade finalization to complete the upgrade to v23.1. Cockroach Labs recommends that you either finalize or roll back a major-version upgrade within a relative short period of time; running in a partially-upgraded state is not recommended.

    Warning:

    A cluster that is not finalized on v22.2 cannot be upgraded to v23.1 until the v22.2 upgrade is finalized.

  2. Once you are satisfied with the new version, run cockroach sql against any node in the cluster to open the SQL shell.

  3. Re-enable auto-finalization:

    icon/buttons/copy
    > RESET CLUSTER SETTING cluster.preserve_downgrade_option;
    

    A series of migration jobs runs to enable certain types of features and changes in the new major version that cannot be rolled back. These include changes to system schemas, indexes, and descriptors, and enabling certain types of improvements and new features. Until the upgrade is finalized, these features and functions will not be available and the command SHOW CLUSTER SETTING version will return 22.2.

    You can monitor the process of the migration in the DB Console Jobs page. Migration jobs have names in the format 23.1-{migration-id}. If a migration job fails or stalls, Cockroach Labs can use the migration ID to help diagnose and troubleshoot the problem. Each major version has different migration jobs with different IDs.

    Note:

    All schema change jobs must reach a terminal state before finalization can complete. Finalization can therefore take as long as the longest-running schema change. Otherwise, the amount of time required for finalization depends on the amount of data in the cluster, as the process runs various internal maintenance and migration tasks. During this time, the cluster will experience a small amount of additional load.

    When all migration jobs have completed, the upgrade is complete.

  4. To confirm that finalization has completed, check the cluster version:

    icon/buttons/copy
    > SHOW CLUSTER SETTING version;
    

    If the cluster continues to report that it is on the previous version, finalization has not completed. If auto-finalization is enabled but finalization has not completed, check for the existence of decommissioning nodes where decommission has stalled. In most cases, issuing the decommission command again resolves the issue. If you have trouble upgrading, contact Support.

  5. If you paused changefeed jobs before starting the upgrade process, you can now resume the jobs:

    icon/buttons/copy
    RESUME JOB {changefeed_job_ID};
    

    Or, resume all changefeed jobs:

    icon/buttons/copy
    RESUME ALL CHANGEFEED JOBS;
    

    Check that changefeeds are running:

    icon/buttons/copy
    SHOW CHANGEFEED JOBS;
    

After the upgrade to v23.1 is finalized, you may notice an increase in compaction activity due to a background migration job within the storage engine. To observe the migration's progress, check the Compactions section of the Storage Dashboard in the DB Console or monitor the storage.marked-for-compaction-files time-series metric. When the metric's value nears or reaches 0, the migration is complete and compaction activity will return to normal levels.

Troubleshooting

After the upgrade has finalized (whether manually or automatically), it is no longer possible to downgrade to the previous release. If you are experiencing problems, we therefore recommend that you run the cockroach debug zip command on any cluster node to capture your cluster's state, then open a support request and share your debug zip.

In the event of catastrophic failure or corruption, it may be necessary to restore from a backup to a new cluster running v22.2.

See also


Yes No
On this page

Yes No