Upgrade a cluster in Kubernetes

On this page Carat arrow pointing down

This page shows how to upgrade a CockroachDB cluster that is deployed on a Kubernetes cluster.

Overview

CockroachDB offers the following types of upgrades:

  • Major-version upgrades: A major-version upgrade moves a cluster from one major version of CockroachDB to another, such as from v24.2 to v24.3. A major-version upgrade may include new features, updates to cluster setting defaults, and backward-incompatible changes. Performing a major-version upgrade requires an additional step to finalize the upgrade.

    As of 2024, every second major version is an Innovation release. Innovation releases offer shorter support windows and can be skipped.

  • Patch upgrades: A patch upgrade moves a cluster from one patch to another within a major version, such as from v24.2.3 to v24.2.4. Patch upgrades do not introduce backward-incompatible changes.

    A major version of CockroachDB has two phases of patch releases: a series of testing releases followed by a series of production releases. A major version’s initial production release is also known as its GA release. In the lead-up to a new major version's GA release, a series of Testing releases may be made available for testing and validation. Testing releases are intended for testing and experimentation only, and are not qualified for production environments or eligible for support or uptime SLA commitments.

    Note:
    A cluster cannot be upgraded from an alpha binary of a prior release or from a binary built from the master branch of the CockroachDB source code.

To learn more about CockroachDB major versions and patches, refer to the Releases Overview.

On Kubernetes, the upgrade is a staged update in which each pod's container image for CockroachDB is updated in a rolling fashion. The cluster remains available during the upgrade.

Select the cluster's deployment method to continue.

Before you begin

Tip:

If you deployed CockroachDB on Red Hat OpenShift, substitute kubectl with oc in the following commands.

Note:

All kubectl steps should be performed in the namespace where you installed the Operator. By default, this is cockroach-operator-system.

Before beginning a major-version or patch upgrade:

  1. Verify the overall health of your cluster using the DB Console:

    • Under Node Status, make sure all nodes that should be live are listed as such. If any nodes are unexpectedly listed as SUSPECT or DEAD, identify why the nodes are offline and either restart them or decommission them before beginning your upgrade. If there are DEAD and non-decommissioned nodes in your cluster, the upgrade cannot be finalized.

      If any node is not fully decommissioned, try the following: 1. First, reissue the decommission command. The second command typically succeeds within a few minutes. 1. If the second decommission command does not succeed, recommission and then decommission it again. Before continuing the upgrade, the node must be marked as decommissioned.

    • Under Replication Status, make sure there are 0 under-replicated and unavailable ranges. Otherwise, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to identify and resolve the cause of range under-replication and/or unavailability before beginning your upgrade.

    • In the Node List, make sure all nodes are on the same version. Upgrade them to the cluster's current version before continuing. If any nodes are behind, this also indicates that the previous major-version upgrade may not be finalized.

    • In the Metrics dashboards, make sure CPU, memory, and storage capacity are within acceptable values for each node. Nodes must be able to tolerate some increase in case the new version uses more resources for your workload. If any of these metrics is above healthy limits, consider adding nodes to your cluster before beginning your upgrade.

  2. Make sure your cluster is behind a load balancer, or your clients are configured to talk to multiple nodes. If your application communicates with only a single node, stopping that node to upgrade its CockroachDB binary will cause your application to fail.

  3. By default, the storage engine uses a compaction concurrency of 3. If you have sufficient IOPS and CPU headroom, you can consider increasing this setting via the COCKROACH_COMPACTION_CONCURRENCY environment variable. This may help to reshape the LSM more quickly in inverted LSM scenarios; and it can lead to increased overall performance for some workloads. Cockroach Labs strongly recommends testing your workload against non-default values of this setting.

  4. CockroachDB is designed with high fault tolerance. However, taking regular backups of your data is an operational best practice for disaster recovery planning.

We recommend that you enable managed backups and confirm that the cluster is backed up before beginning a major-version upgrade. This provides an extra layer of protection in case the upgrade leads to issues.

Refer to Restoring backups across versions. 1. Review the v24.3 Release Notes, as well as the release notes for any skipped major version. Pay careful attention to the sections for backward-incompatible changes, deprecations, changes to default cluster settings, and features that are not available until the upgrade is finalized. 1. Optionally disable auto-finalization to preserve the ability to roll back a major-version upgrade instead of finalizing it. If auto-finalization is disabled, a major-version upgrade is not complete until it is finalized.

Ensure you have a valid license key

To perform major version upgrades, you must have a valid license key.

Patch version upgrades can be performed without a valid license key, with the following limitations:

  • The cluster will run without limitations for a specified grace period. During that time, alerts are displayed that the cluster needs a valid license key. For more information, refer to the Licensing FAQs.
  • The cluster is throttled at the end of the grace period if no valid license key is added to the cluster before then.

If you have an Enterprise Free or Enterprise Trial license, you must enable telemetry using the diagnostics.reporting.enabled cluster setting, as shown below in order to finalize a major version upgrade:

icon/buttons/copy
SET CLUSTER SETTING diagnostics.reporting.enabled = true;

If a cluster with an Enterprise Free or Enterprise Trial license is upgraded across patch versions and does not meet telemetry requirements:

  • The cluster will run without limitations for a 7-day grace period. During that time, alerts are displayed that the cluster needs to send telemetry.
  • The cluster is throttled if telemetry is not received before the end of the grace period.

For more information, refer to the Licensing FAQs.

If you want to stay on the previous version, you can roll back the upgrade before finalization.

Perform a patch upgrade

To upgrade from one patch release to another within the same major version, perform the following steps on one node at a time:

  1. Change the container image in the custom resource:

    image:
      name: cockroachdb/cockroach:v24.3.0
    
  2. Apply the new settings to the cluster:

    icon/buttons/copy
    kubectl apply -f example.yaml
    

    The Operator will perform the staged update.

  3. To check the status of the rolling upgrade, run kubectl get pods.

  4. Verify that all pods have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    

    You can also check the CockroachDB version of each node in the DB Console.

  1. Add a partition to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., cockroachdb-0, cockroachdb-1, cockroachdb-2) the partition value should be 2:

    icon/buttons/copy
    $ kubectl patch statefulset cockroachdb \
    -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
    
    statefulset.apps/cockroachdb patched
    
  2. Change the container image in the StatefulSet:

    icon/buttons/copy
    $ kubectl patch statefulset cockroachdb \
    --type='json' \
    -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:v24.3.0"}]'
    
    statefulset.apps/cockroachdb patched
    
  3. To check the status of the rolling upgrade, run kubectl get pods.

  4. Verify that all pods have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    

    You can also check the CockroachDB version of each node in the DB Console.

  1. Add a partition to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., cockroachdb-0, cockroachdb-1, cockroachdb-2) the partition value should be 2:

    icon/buttons/copy
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set statefulset.updateStrategy.rollingUpdate.partition=2
    
  2. Connect to the cluster using the SQL shell:

    icon/buttons/copy
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach sql \
    --certs-dir=/cockroach-certs \
    --host=my-release-cockroachdb-public
    
  3. Remove the cluster initialization job from when the cluster was created:

    icon/buttons/copy
    $ kubectl delete job my-release-cockroachdb-init
    
  4. Change the container image in the StatefulSet:

    icon/buttons/copy
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set image.tag=v24.3.0 \
    --reuse-values
    
  5. To check the status of the rolling upgrade, run kubectl get pods.

  6. Verify that all pods have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    

    You can also check the CockroachDB version of each node in the DB Console.

Roll back a patch upgrade

To roll back a patch upgrade, repeat the steps in Perform a patch upgrade, but configure the container image for the pods to the previous major version.

Perform a major-version upgrade

To perform a major upgrade:

  1. Change the container image image in the custom resource:

    image:
      name: cockroachdb/cockroach:v24.3.0
    
  2. Apply the new settings to the cluster:

    icon/buttons/copy
    kubectl apply -f example.yaml
    

    The Operator will perform the staged update.

  3. To check the status of the rolling upgrade, run kubectl get pods.

  4. Verify that all pods have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    

    You can also check the CockroachDB version of each node in the DB Console.

  5. Before beginning a major-version upgrade, the Operator disables auto-finalization by setting the cluster setting cluster.preserve_downgrade_option to the cluster's current major version. Before finalizing an upgrade, follow your organization's testing procedures to decide whether to finalize or roll back the upgrade. After finalization begins, you can no longer roll back to the cluster's previous major version.

1. 1. Add a partition to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., cockroachdb-0, cockroachdb-1, cockroachdb-2) the partition value should be 2:

icon/buttons/copy

~~~ shell $ kubectl patch statefulset cockroachdb \ -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}' ~~~

~~~ statefulset.apps/cockroachdb patched ~~~

  1. Change the container image in the StatefulSet:

    icon/buttons/copy
    $ kubectl patch statefulset cockroachdb \
    --type='json' \
    -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:v24.3.0"}]'
    
    statefulset.apps/cockroachdb patched
    
  2. To check the status of the rolling upgrade, run kubectl get pods.

  3. After the pod has been restarted with the new image, start the CockroachDB built-in SQL client:

    icon/buttons/copy
    $ kubectl exec -it cockroachdb-client-secure \-- ./cockroach sql \
    --certs-dir=/cockroach-certs \
    --host=cockroachdb-public
    
  4. Run the following SQL query to verify that the number of underreplicated ranges is zero:

    icon/buttons/copy
    SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status;
    
      ranges_underreplicated
    --------------------------
                           0
    (1 row)
    

    This indicates that it is safe to proceed to the next pod.

  5. Exit the SQL shell:

    icon/buttons/copy
    > \q
    
  6. Decrement the partition value by 1 to allow the next pod in the cluster to update:

    icon/buttons/copy
    $ kubectl patch statefulset cockroachdb \
    -p='{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'
    
    statefulset.apps/cockroachdb patched
    
  7. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be 0).

  8. Check the image of each pod to confirm that all have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    
    cockroachdb-0   cockroachdb/cockroach:v24.3.0
    cockroachdb-1   cockroachdb/cockroach:v24.3.0
    cockroachdb-2   cockroachdb/cockroach:v24.3.0
    ...
    

    You can also check the CockroachDB version of each node in the DB Console.

  9. If auto-finalization is disabled, the upgrade is not complete until you finalize the upgrade.

  1. Add a partition to the update strategy defined in the StatefulSet. Only the pods numbered greater than or equal to the partition value will be updated. For a cluster with 3 pods (e.g., cockroachdb-0, cockroachdb-1, cockroachdb-2) the partition value should be 2:

    icon/buttons/copy
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set statefulset.updateStrategy.rollingUpdate.partition=2
    
  2. Connect to the cluster using the SQL shell:

    icon/buttons/copy
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach sql \
    --certs-dir=/cockroach-certs \
    --host=my-release-cockroachdb-public
    
  3. Remove the cluster initialization job from when the cluster was created:

    icon/buttons/copy
    $ kubectl delete job my-release-cockroachdb-init
    
  4. Change the container image in the StatefulSet:

    icon/buttons/copy
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set image.tag=v24.3.0 \
    --reuse-values
    
    NAME                                READY     STATUS              RESTARTS   AGE
    my-release-cockroachdb-0            1/1       Running             0          2m
    my-release-cockroachdb-1            1/1       Running             0          3m
    my-release-cockroachdb-2            0/1       ContainerCreating   0          25s
    my-release-cockroachdb-init-nwjkh   0/1       ContainerCreating   0          6s
    ...
    
    Note:

    Ignore the pod for cluster initialization. It is re-created as a byproduct of the StatefulSet configuration but does not impact your existing cluster.

  5. After the pod has been restarted with the new image, start the CockroachDB built-in SQL client:

    icon/buttons/copy
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach sql \
    --certs-dir=/cockroach-certs \
    --host=my-release-cockroachdb-public
    
  6. Run the following SQL query to verify that the number of underreplicated ranges is zero:

    icon/buttons/copy
    SELECT sum((metrics->>'ranges.underreplicated')::DECIMAL)::INT AS ranges_underreplicated FROM crdb_internal.kv_store_status;
    
      ranges_underreplicated
    --------------------------
                           0
    (1 row)
    

    This indicates that it is safe to proceed to the next pod.

  7. Exit the SQL shell:

    icon/buttons/copy
    > \q
    
  8. Decrement the partition value by 1 to allow the next pod in the cluster to update:

    icon/buttons/copy
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set statefulset.updateStrategy.rollingUpdate.partition=1 \
    
  9. Repeat steps 4-8 until all pods have been restarted and are running the new image (the final partition value should be 0).

  10. Check the image of each pod to confirm that all have been upgraded:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    
    my-release-cockroachdb-0    cockroachdb/cockroach:v24.3.0
    my-release-cockroachdb-1    cockroachdb/cockroach:v24.3.0
    my-release-cockroachdb-2    cockroachdb/cockroach:v24.3.0
    ...
    

    You can also check the CockroachDB version of each node in the DB Console.

  11. If auto-finalization is disabled, the upgrade is not complete until you finalize the upgrade.

Finalize a major-version upgrade manually

To finalize a major-version upgrade:

  1. Connect to the cluster using the SQL shell:

    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach sql \
    --certs-dir=/cockroach-certs \
    --host=cockroachdb-public
    
  2. Run the following command:

    icon/buttons/copy
    > RESET CLUSTER SETTING cluster.preserve_downgrade_option;
    

    A series of migration jobs runs to enable certain types of features and changes in the new major version that cannot be rolled back. These include changes to system schemas, indexes, and descriptors, and enabling certain types of improvements and new features. Until the upgrade is finalized, these features and functions will not be available and the command SHOW CLUSTER SETTING version will return the previous version`.

    You can monitor the process of the migration in the DB Console Jobs page. Migration jobs have names in the format 24.3-{migration-id}. If a migration job fails or stalls, Cockroach Labs can use the migration ID to help diagnose and troubleshoot the problem. Each major version has different migration jobs with different IDs.

    The amount of time required for finalization depends on the amount of data in the cluster, because finalization runs various internal maintenance and migration tasks. During this time, the cluster will experience a small amount of additional load.

    When all migration jobs have completed, the upgrade is complete.

  3. To confirm that finalization has completed, check the cluster version:

    icon/buttons/copy
    > SHOW CLUSTER SETTING version;
    

    If the cluster continues to report that it is on the previous version, finalization has not completed. If auto-finalization is enabled but finalization has not completed, check for the existence of decommissioning nodes where decommission has stalled. In most cases, issuing the decommission command again resolves the issue. If you have trouble upgrading, contact Support.

Roll back a major-version upgrade

To roll back to the previous major version before an upgrade is finalized:

  1. Change the container image in the custom resource to use the previous major version:

    image:
      name: cockroachdb/cockroach:v24.3
    
  2. Apply the new settings to the cluster:

    icon/buttons/copy
    kubectl apply -f example.yaml
    

    The Operator will perform the staged rollback.

  3. To check the status of the rollback, run kubectl get pods.

  4. Verify that all pods have been rolled back:

    icon/buttons/copy
    $ kubectl get pods \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
    

Rollbacks do not require finalization.

Disable auto-finalization

Troubleshooting

After the upgrade has finalized (whether manually or automatically), it is no longer possible to roll back the upgrade. If you are experiencing problems, we recommend that you open a support request for assistance.

In the event of catastrophic failure or corruption, it may be necessary to restore from a backup to a new cluster running the previous major version.

See also


Yes No
On this page

Yes No