This article assumes you have already deployed CockroachDB on a single Kubernetes cluster.
This page explains how to add and remove CockroachDB nodes on Kubernetes.
All kubectl
steps should be performed in the namespace where you installed the Operator. By default, this is cockroach-operator-system
.
If you deployed CockroachDB on Red Hat OpenShift, substitute kubectl
with oc
in the following commands.
Add nodes
Before scaling up CockroachDB, note the following topology recommendations:
- Each CockroachDB node (running in its own pod) should run on a separate Kubernetes worker node.
- Each availability zone should have the same number of CockroachDB nodes.
If your cluster has 3 CockroachDB nodes distributed across 3 availability zones (as in our deployment example), we recommend scaling up by a multiple of 3 to retain an even distribution of nodes. You should therefore scale up to a minimum of 6 CockroachDB nodes, with 2 nodes in each zone.
Run
kubectl get nodes
to list the worker nodes in your Kubernetes cluster. There should be at least as many worker nodes as pods you plan to add. This ensures that no more than one pod will be placed on each worker node.If you need to add worker nodes, resize your GKE cluster by specifying the desired number of worker nodes in each zone:
gcloud container clusters resize {cluster-name} --region {region-name} --num-nodes 2
This example distributes 2 worker nodes across the default 3 zones, raising the total to 6 worker nodes.
If you are adding nodes after previously scaling down, and have not enabled automatic PVC pruning, you must first manually delete any persistent volumes that were orphaned by node removal.
Note:Due to a known issue, automatic pruning of PVCs is currently disabled by default. This means that after decommissioning and removing a node, the Operator will not remove the persistent volume that was mounted to its pod.
View the PVCs on the cluster:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-cockroachdb-0 Bound pvc-f1ce6ed2-ceda-40d2-8149-9e5b59faa9df 60Gi RWO standard 24m datadir-cockroachdb-1 Bound pvc-308da33c-ec77-46c7-bcdf-c6e610ad4fea 60Gi RWO standard 24m datadir-cockroachdb-2 Bound pvc-6816123f-29a9-4b86-a4e2-b67f7bb1a52c 60Gi RWO standard 24m datadir-cockroachdb-3 Bound pvc-63ce836a-1258-4c58-8b37-d966ed12d50a 60Gi RWO standard 24m datadir-cockroachdb-4 Bound pvc-1ylabv86-6512-6n12-bw3g-i0dh2zxvfhd0 60Gi RWO standard 24m datadir-cockroachdb-5 Bound pvc-2vka2c9x-7824-41m5-jk45-mt7dzq90q97x 60Gi RWO standard 24m
The PVC names correspond to the pods they are bound to. For example, if the pods
cockroachdb-3
,cockroachdb-4
, andcockroachdb-5
had been removed by scaling the cluster down from 6 to 3 nodes,datadir-cockroachdb-3
,datadir-cockroachdb-4
, anddatadir-cockroachdb-5
would be the PVCs for the orphaned persistent volumes. To verify that a PVC is not currently bound to a pod:kubectl describe pvc datadir-cockroachdb-5
The output will include the following line:
Mounted By: <none>
If the PVC is bound to a pod, it will specify the pod name.
Remove the orphaned persistent volumes by deleting their PVCs:
Warning:Before deleting any persistent volumes, be sure you have a backup copy of your data. Data cannot be recovered once the persistent volumes are deleted. For more information, see the Kubernetes documentation.
kubectl delete pvc datadir-cockroachdb-3 datadir-cockroachdb-4 datadir-cockroachdb-5
persistentvolumeclaim "datadir-cockroachdb-3" deleted persistentvolumeclaim "datadir-cockroachdb-4" deleted persistentvolumeclaim "datadir-cockroachdb-5" deleted
Update
nodes
in the Operator's custom resource, which you downloaded when deploying the cluster, with the target size of the CockroachDB cluster. This value refers to the number of CockroachDB nodes, each running in one pod:nodes: 6
Note:Note that you must scale by updating the
nodes
value in the custom resource. Usingkubectl scale statefulset <cluster-name> --replicas=4
will result in new pods immediately being terminated.Apply the new settings to the cluster:
$ kubectl apply -f example.yaml
Verify that the new pods were successfully started:
kubectl get pods
NAME READY STATUS RESTARTS AGE cockroach-operator-655fbf7847-zn9v8 1/1 Running 0 30m cockroachdb-0 1/1 Running 0 24m cockroachdb-1 1/1 Running 0 24m cockroachdb-2 1/1 Running 0 24m cockroachdb-3 1/1 Running 0 30s cockroachdb-4 1/1 Running 0 30s cockroachdb-5 1/1 Running 0 30s
Each pod should be running in one of the 6 worker nodes.
Before scaling up CockroachDB, note the following topology recommendations:
- Each CockroachDB node (running in its own pod) should run on a separate Kubernetes worker node.
- Each availability zone should have the same number of CockroachDB nodes.
If your cluster has 3 CockroachDB nodes distributed across 3 availability zones (as in our deployment example), we recommend scaling up by a multiple of 3 to retain an even distribution of nodes. You should therefore scale up to a minimum of 6 CockroachDB nodes, with 2 nodes in each zone.
Run
kubectl get nodes
to list the worker nodes in your Kubernetes cluster. There should be at least as many worker nodes as pods you plan to add. This ensures that no more than one pod will be placed on each worker node.Add worker nodes if necessary:
On GKE, resize your cluster. If you deployed a regional cluster as we recommended, you will use
--num-nodes
to specify the desired number of worker nodes in each zone. For example:gcloud container clusters resize {cluster-name} --region {region-name} --num-nodes 2
On EKS, resize your Worker Node Group.
On GCE, resize your Managed Instance Group.
On AWS, resize your Auto Scaling Group.
Edit your StatefulSet configuration to add pods for each new CockroachDB node:
$ kubectl scale statefulset cockroachdb --replicas=6
statefulset.apps/cockroachdb scaled
Verify that the new pod started successfully:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 51m cockroachdb-1 1/1 Running 0 47m cockroachdb-2 1/1 Running 0 3m cockroachdb-3 1/1 Running 0 1m cockroachdb-4 1/1 Running 0 1m cockroachdb-5 1/1 Running 0 1m cockroachdb-client-secure 1/1 Running 0 15m ...
You can also open the Node List in the DB Console to ensure that the fourth node successfully joined the cluster.
Before scaling CockroachDB, ensure that your Kubernetes cluster has enough worker nodes to host the number of pods you want to add. This is to ensure that two pods are not placed on the same worker node, as recommended in our production guidance.
For example, if you want to scale from 3 CockroachDB nodes to 4, your Kubernetes cluster should have at least 4 worker nodes. You can verify the size of your Kubernetes cluster by running kubectl get nodes
.
Edit your StatefulSet configuration to add another pod for the new CockroachDB node:
$ helm upgrade \ my-release \ cockroachdb/cockroachdb \ --set statefulset.replicas=4 \ --reuse-values
Release "my-release" has been upgraded. Happy Helming! LAST DEPLOYED: Tue May 14 14:06:43 2019 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1beta1/PodDisruptionBudget NAME AGE my-release-cockroachdb-budget 51m ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 38m my-release-cockroachdb-1 1/1 Running 0 39m my-release-cockroachdb-2 1/1 Running 0 39m my-release-cockroachdb-3 0/1 Pending 0 0s my-release-cockroachdb-init-nwjkh 0/1 Completed 0 39m ...
Get the name of the
Pending
CSR for the new pod:$ kubectl get csr
NAME AGE REQUESTOR CONDITION default.client.root 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-3 2m system:serviceaccount:default:default Pending node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued ...
If you do not see a
Pending
CSR, wait a minute and try again.Examine the CSR for the new pod:
$ kubectl describe csr default.node.my-release-cockroachdb-3
Name: default.node.my-release-cockroachdb-3 Labels: <none> Annotations: <none> CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500 Requesting User: system:serviceaccount:default:default Status: Pending Subject: Common Name: node Serial Number: Organization: Cockroach Subject Alternative Names: DNS Names: localhost my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local my-release-cockroachdb-1.my-release-cockroachdb my-release-cockroachdb-public my-release-cockroachdb-public.default.svc.cluster.local IP Addresses: 127.0.0.1 10.48.1.6 Events: <none>
If everything looks correct, approve the CSR for the new pod:
$ kubectl certificate approve default.node.my-release-cockroachdb-3
certificatesigningrequest.certificates.k8s.io/default.node.my-release-cockroachdb-3 approved
Verify that the new pod started successfully:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 51m my-release-cockroachdb-1 1/1 Running 0 47m my-release-cockroachdb-2 1/1 Running 0 3m my-release-cockroachdb-3 1/1 Running 0 1m cockroachdb-client-secure 1/1 Running 0 15m ...
You can also open the Node List in the DB Console to ensure that the fourth node successfully joined the cluster.
Remove nodes
Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on CockroachDB and will cause errors.
Due to a known issue, automatic pruning of PVCs is currently disabled by default. This means that after decommissioning and removing a node, the Operator will not remove the persistent volume that was mounted to its pod.
If you plan to eventually scale up the cluster after scaling down, you will need to manually delete any PVCs that were orphaned by node removal before scaling up. For more information, see Add nodes.
If you want to enable the Operator to automatically prune PVCs when scaling down, see Automatic PVC pruning. However, note that this workflow is currently unsupported.
Before scaling down CockroachDB, note the following topology recommendation:
- Each availability zone should have the same number of CockroachDB nodes.
If your nodes are distributed across 3 availability zones (as in our deployment example), we recommend scaling down by a multiple of 3 to retain an even distribution. If your cluster has 6 CockroachDB nodes, you should therefore scale down to 3, with 1 node in each zone.
Update
nodes
in the custom resource, which you downloaded when deploying the cluster, with the target size of the CockroachDB cluster. For instance, to scale down to 3 nodes:nodes: 3
Note:Before removing a node, the Operator first decommissions the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
Apply the new settings to the cluster:
$ kubectl apply -f example.yaml
The Operator will remove nodes from the cluster one at a time, starting from the pod with the highest number in its address.
Verify that the pods were successfully removed:
kubectl get pods
NAME READY STATUS RESTARTS AGE cockroach-operator-655fbf7847-zn9v8 1/1 Running 0 32m cockroachdb-0 1/1 Running 0 26m cockroachdb-1 1/1 Running 0 26m cockroachdb-2 1/1 Running 0 26m
Automatic PVC pruning
To enable the Operator to automatically remove persistent volumes when scaling down a cluster, turn on automatic PVC pruning through a feature gate.
This workflow is unsupported and should be enabled at your own risk.
Download the Operator manifest:
$ curl -0 https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.16.0/install/operator.yaml
Uncomment the following lines in the Operator manifest:
- feature-gates - AutoPrunePVC=true
Reapply the Operator manifest:
$ kubectl apply -f operator.yaml
Validate that the Operator is running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroach-operator-6f7b86ffc4-9ppkv 1/1 Running 0 22s ...
Before removing a node from your cluster, you must first decommission the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Prepare for graceful shutdown.
Use the
cockroach node status
command to get the internal IDs of nodes. For example, if you followed the steps in Deploy CockroachDB with Kubernetes to launch a secure client pod, get a shell into thecockroachdb-client-secure
pod:$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node status \ --certs-dir=/cockroach-certs \ --host=cockroachdb-public
id | address | build | started_at | updated_at | is_available | is_live +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+ 1 | cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true 2 | cockroachdb-2.cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true 3 | cockroachdb-1.cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true 4 | cockroachdb-3.cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true (4 rows)
The pod uses the
root
client certificate created earlier to initialize the cluster, so there's no CSR approval required.Use the
cockroach node decommission
command to decommission the node with the highest number in its address, specifying its ID (in this example, node ID4
because its address iscockroachdb-3
):Note:You must decommission the node with the highest number in its address. Kubernetes will remove the pod for the node with the highest number in its address when you reduce the replica count.
$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node decommission 4 \ --certs-dir=/cockroach-certs \ --host=cockroachdb-public
You'll then see the decommissioning status print to
stderr
as it changes:id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 73 | true | decommissioning | false
Once the node has been fully decommissioned, you'll see a confirmation:
id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 0 | true | decommissioning | false (1 row) No more data reported on target nodes. Please verify cluster health before removing the nodes.
Once the node has been decommissioned, scale down your StatefulSet:
$ kubectl scale statefulset cockroachdb --replicas=3
statefulset.apps/cockroachdb scaled
Verify that the pod was successfully removed:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 51m cockroachdb-1 1/1 Running 0 47m cockroachdb-2 1/1 Running 0 3m cockroachdb-client-secure 1/1 Running 0 15m ...
You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-cockroachdb-0 Bound pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-1 Bound pvc-75e143ca-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-2 Bound pvc-75ef409a-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-3 Bound pvc-75e561ba-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m
Verify that the PVC with the highest number in its name is no longer mounted to a pod:
$ kubectl describe pvc datadir-cockroachdb-3
Name: datadir-cockroachdb-3 ... Mounted By: <none>
Remove the persistent volume by deleting the PVC:
$ kubectl delete pvc datadir-cockroachdb-3
persistentvolumeclaim "datadir-cockroachdb-3" deleted
Before removing a node from your cluster, you must first decommission the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Prepare for graceful shutdown.
Use the
cockroach node status
command to get the internal IDs of nodes. For example, if you followed the steps in Deploy CockroachDB with Kubernetes to launch a secure client pod, get a shell into thecockroachdb-client-secure
pod:$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node status \ --certs-dir=/cockroach-certs \ --host=my-release-cockroachdb-public
id | address | build | started_at | updated_at | is_available | is_live +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+ 1 | my-release-cockroachdb-0.my-release-cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true 2 | my-release-cockroachdb-2.my-release-cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true 3 | my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true 4 | my-release-cockroachdb-3.my-release-cockroachdb.default.svc.cluster.local:26257 | v24.3.0 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true (4 rows)
The pod uses the
root
client certificate created earlier to initialize the cluster, so there's no CSR approval required.Use the
cockroach node decommission
command to decommission the node with the highest number in its address, specifying its ID (in this example, node ID4
because its address ismy-release-cockroachdb-3
):Note:You must decommission the node with the highest number in its address. Kubernetes will remove the pod for the node with the highest number in its address when you reduce the replica count.
$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node decommission 4 \ --certs-dir=/cockroach-certs \ --host=my-release-cockroachdb-public
You'll then see the decommissioning status print to
stderr
as it changes:id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 73 | true | decommissioning | false
Once the node has been fully decommissioned, you'll see a confirmation:
id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 0 | true | decommissioning | false (1 row) No more data reported on target nodes. Please verify cluster health before removing the nodes.
Once the node has been decommissioned, scale down your StatefulSet:
$ helm upgrade \ my-release \ cockroachdb/cockroachdb \ --set statefulset.replicas=3 \ --reuse-values
Verify that the pod was successfully removed:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 51m my-release-cockroachdb-1 1/1 Running 0 47m my-release-cockroachdb-2 1/1 Running 0 3m cockroachdb-client-secure 1/1 Running 0 15m ...
You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-my-release-cockroachdb-0 Bound pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-1 Bound pvc-75e143ca-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-2 Bound pvc-75ef409a-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-3 Bound pvc-75e561ba-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m
Verify that the PVC with the highest number in its name is no longer mounted to a pod:
$ kubectl describe pvc datadir-my-release-cockroachdb-3
Name: datadir-my-release-cockroachdb-3 ... Mounted By: <none>
Remove the persistent volume by deleting the PVC:
$ kubectl delete pvc datadir-my-release-cockroachdb-3
persistentvolumeclaim "datadir-my-release-cockroachdb-3" deleted