Fail Over from a Primary Cluster to a Standby Cluster

On this page Carat arrow pointing down

Physical cluster replication (PCR) allows you to fail over from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication, it will stop the stream of new data, reset the standby virtual cluster to a point in time where all ingested data is consistent, and then mark the standby virtual cluster as ready to accept traffic.

The failover is a two-step process on the standby cluster:

  1. Initiating the failover.
  2. Completing the failover.
Warning:

Initiating a failover is a manual process that makes the standby cluster ready to accept SQL connections. However, the failover process does not automatically redirect traffic to the standby cluster. Once the failover is complete, you must redirect application traffic to the standby (new) cluster. If you do not manually redirect traffic, writes to the primary (original) cluster may be lost.

After a failover, you may want to fail back to the original primary cluster (or a different cluster) to set up the original primary cluster to once again accept application traffic. For more details, refer to Fail back to the primary cluster.

Before you begin

During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Before you fail over to the standby cluster, or fail back to the original primary cluster, consider how you will manage running (replicated) jobs between the clusters. Refer to Job management for instructions.

Step 1. Initiate the failover

To initiate a failover to the standby cluster, there are different ways of specifying the point in time for the standby's promotion. That is, the standby cluster's live data at the point of failover. Refer to the following sections for steps:

  • LATEST: The most recent replicated timestamp.
  • Point-in-time:
    • Past: A timestamp in the past that is within the failover window.
    • Future: A timestamp in the future in order to plan a failover.

Fail over to the most recent replicated time

To initiate a failover to the most recent replicated timestamp, you can specify LATEST. It is important to note that the latest replicated time may be behind the actual time if there is replication lag in the stream. That is, the time between the most up-to-date replicated time and the actual time.

To view the current replication timestamp, use:

icon/buttons/copy
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
icon/buttons/copy
  id | name | source_tenant_name |              source_cluster_uri                 |         retained_time           |    replicated_time     | replication_lag | failover_time |   status
-----+------+--------------------+-------------------------------------------------+---------------------------------+------------------------+-----------------+--------------+--------------
   3 | main | main               | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00   | 2024-04-18 14:07:45+00 | 00:00:19.602682 |         NULL | replicating
(1 row)
Tip:

You can view the Replication Lag graph in the standby cluster's DB Console.

Run the following from the standby cluster's SQL shell to start the failover:

icon/buttons/copy
ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO LATEST;

The failover_time is the timestamp at which the replicated data is consistent. The cluster will revert any data above this timestamp:

           failover_time
----------------------------------
  1695922878030920020.0000000000
(1 row)

Fail over to a point in time

You can control the point in time that the replication stream will fail over to.

To select a specific time in the past, use:

icon/buttons/copy
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;

The retained_time response provides the earliest time to which you can fail over.

  id | name | source_tenant_name |              source_cluster_uri                 |         retained_time         |    replicated_time     | replication_lag | failover_time |   status
-----+------+--------------------+-------------------------------------------------+-------------------------------+------------------------+-----------------+--------------+--------------
   3 | main | main               | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00 | 2024-04-18 14:07:45+00 | 00:00:19.602682 |         NULL | replicating
(1 row)

Specify a timestamp:

icon/buttons/copy
ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO SYSTEM TIME '-1h';

Refer to Using different timestamp formats for more information.

Similarly, to fail over to a specific time in the future:

icon/buttons/copy
ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO SYSTEM TIME '+5h';

A future failover will proceed once the replicated data has reached the specified time.

Note:

To monitor for when the replication stream completes, do the following:

  1. Find the replication stream's job_id using SELECT * FROM [SHOW JOBS] WHERE job_type = 'REPLICATION STREAM INGESTION';
  2. Run SHOW JOB WHEN COMPLETE job_id. Refer to the SHOW JOBS page for details and an example.

Step 2. Complete the failover

  1. The completion of the replication is asynchronous; to monitor its progress use:

    icon/buttons/copy
    SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
    
    id | name | source_tenant_name |              source_cluster_uri                 |         retained_time         |    replicated_time           | replication_lag | failover_time                   |   status
    ---+------+--------------------+-------------------------------------------------+-------------------------------+------------------------------+-----------------+--------------------------------+--------------
    3  | main | main               | postgresql://user@hostname or IP:26257?redacted | 2023-09-28 16:09:04.327473+00 | 2023-09-28 17:41:18.03092+00 | 00:00:19.602682 | 1695922878030920020.0000000000 | replication pending failover
    (1 row)
    

    Refer to Physical Cluster Replication Monitoring for the Responses and Data state of SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS fields.

  2. Once complete, bring the standby's virtual cluster online with:

    icon/buttons/copy
    ALTER VIRTUAL CLUSTER main START SERVICE SHARED;
    
      id |        name         |     data_state     | service_mode
    -----+---------------------+--------------------+---------------
      1  | system              | ready              | shared
      3  | main                | ready              | shared
    (3 rows)
    
  3. To make the standby's virtual cluster the default for connection strings, set the following cluster setting:

    icon/buttons/copy
    SET CLUSTER SETTING server.controller.default_target_cluster='main';
    

At this point, the primary and standby clusters are entirely independent. You will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to the standby (now primary). To manage replicated jobs on the promoted standby, refer to Job management.

To enable PCR again, from the new primary to the original primary (or a completely different cluster), refer to Fail back to the primary cluster.

Job management

During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Once you have completed a failover (or a failback), refer to the following sections for details on resuming jobs on the promoted cluster.

Backup schedules

Backup schedules will pause after failover on the promoted cluster. Take the following steps to resume jobs:

  1. Verify that there are no other schedules running backups to the same collection of backups, i.e., the schedule that was running on the original primary cluster.
  2. Resume the backup schedule on the promoted cluster.
Note:

If your backup schedule was created on a cluster in v23.1 or earlier, it will not pause automatically on the promoted cluster after failover. In this case, you must pause the schedule manually on the promoted cluster and then take the outlined steps.

Changefeeds

Changefeeds will fail on the promoted cluster immediately after failover to avoid two clusters running the same changefeed to one sink. We recommend that you recreate changefeeds on the promoted cluster.

To avoid multiple clusters running the same schedule concurrently, changefeed schedules will pause after physical cluster replication has completed.

Note:

If your changefeed schedule was created on a cluster in v24.1 or earlier, it will not pause automatically on the promoted cluster after failover. In this case, you will need to manage pausing or canceling the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink.

Fail back to the primary cluster

After failing over to the standby cluster, you may need to fail back to the original primary cluster to serve your application.

To fail back to a cluster that was previously the primary cluster, use the ALTER VIRTUAL CLUSTER syntax:

icon/buttons/copy
ALTER VIRTUAL CLUSTER {original_primary_vc} START REPLICATION OF {promoted_standby_vc} ON {connection_string_standby};

The original primary virtual cluster may be almost up to date with the promoted standby's virtual cluster. The difference in data between the two virtual clusters will include only the writes that have been applied to the promoted standby after failover from the primary cluster.

If the original primary cluster was an existing cluster without virtualization enabled, refer to Fail back after PCR from an existing cluster for details on failback.

Note:

To move back to a different cluster, follow the PCR setup.

Example

This section illustrates the steps to fail back to the original primary cluster from the promoted standby cluster that is currently serving traffic.

  • Cluster A = original primary cluster
  • Cluster B = original standby cluster

Cluster B is serving application traffic after the failover.

  1. To begin the failback to Cluster A, the virtual cluster must first stop accepting connections. Connect to the system virtual on Cluster A:

    icon/buttons/copy
    cockroach sql --url \
    "postgresql://{user}@{node IP or hostname cluster A}:26257?options=-ccluster=system&sslmode=verify-full" \
    --certs-dir "certs"
    
  2. From the system virtual cluster on Cluster A, ensure that service to the virtual cluster has stopped:

    icon/buttons/copy
    ALTER VIRTUAL CLUSTER {cluster_a} STOP SERVICE;
    
  3. Open another terminal window and generate a connection string for Cluster B using cockroach encode-uri:

    icon/buttons/copy
    cockroach encode-uri {replication user}:{password}@{cluster B node IP or hostname}:26257 --ca-cert certs/ca.crt --inline
    

    Copy the output ready for starting the replication stream, which requires the connection string to Cluster B:

    icon/buttons/copy
    postgresql://{replication user}:{password}@{cluster B node IP or hostname}:26257/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A
    
    Tip:

    For details on connection strings, refer to the Connection reference.

  4. Connect to the system virtual cluster for Cluster B:

    icon/buttons/copy
    cockroach sql --url \
    "postgresql://{user}@{cluster B node IP or hostname}:26257?options=-ccluster=system&sslmode=verify-full" \
    --certs-dir "certs"
    
  5. From the system virtual cluster on Cluster B, enable rangefeeds:

    icon/buttons/copy
    SET CLUSTER SETTING kv.rangefeed.enabled = 'true';
    
  6. From the system virtual cluster on Cluster A, start the replication from Cluster B to Cluster A. Include the connection string for Cluster B:

    icon/buttons/copy
    ALTER VIRTUAL CLUSTER {cluster_a} START REPLICATION OF {cluster_b} ON 'postgresql://{replication user}:{password}@{cluster B node IP or hostname}:26257/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A';
    

    This will reset the virtual cluster on Cluster A back to the time at which the same virtual cluster on Cluster B diverged from it. Cluster A will check with Cluster B to confirm that its virtual cluster was replicated from Cluster A as part of the original PCR stream.

    Note:

    (Preview) If you want to start the PCR stream with a read-only virtual cluster on the standby after failing back to the original primary cluster, run the ALTER VIRTUAL CLUSTER statement in this step with the READ VIRTUAL CLUSTER option.

  7. Check the status of the virtual cluster on A:

    icon/buttons/copy
    SHOW VIRTUAL CLUSTER {cluster_a};
    
    icon/buttons/copy
     id |  name  |     data_state     | service_mode
    ----+--------+--------------------+---------------
      1 | system | ready              | shared
      3 | {vc_a} | replicating        | none
      4 | test   | replicating        | none
      (2 rows)
    
  8. From Cluster A, start the failover:

    icon/buttons/copy
    ALTER VIRTUAL CLUSTER {cluster_a} COMPLETE REPLICATION TO LATEST;
    

    The failover_time is the timestamp at which the replicated data is consistent. The cluster will revert any data above this timestamp:

               failover_time
    ----------------------------------
      1714497890000000000.0000000000
    (1 row)
    
  9. From Cluster A, bring the virtual cluster online:

    icon/buttons/copy
    ALTER VIRTUAL CLUSTER {cluster_a} START SERVICE SHARED;
    
  10. To make Cluster A's virtual cluster the default for connection strings, set the following cluster setting:

    icon/buttons/copy
    SET CLUSTER SETTING server.controller.default_target_cluster='{cluster_a}';
    

At this point, Cluster A is once again the primary and Cluster B is once again the standby. The clusters are entirely independent. To direct application traffic to the primary (Cluster A), you will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to Cluster A. To enable PCR again, from the primary to the standby (or a completely different cluster), refer to Set Up Physical Cluster Replication.

Fail back after PCR from an existing cluster

You can replicate data from an existing CockroachDB cluster that does not have cluster virtualization enabled to a standby cluster with cluster virtualization enabled. For instructions on setting up a replication stream in this way, refer to Set up PCR from an existing cluster.

After a failover to the standby cluster, you may want to then set up a replication stream from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are couple of ways to set up a new standby, and some considerations.

In the example, the clusters are named for reference:

  • A = The original primary cluster, which started without virtualization.
  • B = The original standby cluster, which started with virtualization.
  1. You run a replication stream from cluster A to cluster B.
  2. You initiate a failover from cluster A to cluster B.
  3. You promote the main virtual cluster on cluster B and start serving application traffic from B (that acts as the primary).
  4. You need to create a standby cluster for cluster B to replicate changes to. You can do one of the following:
    • Create a new virtual cluster (main) on cluster A from the replication of cluster B. Cluster A is now virtualized. This will start an initial scan because the replication stream will ignore the former workload tables in the system virtual cluster that were originally replicated to B. You can drop the tables that were in the system virtual cluster, because the new virtual cluster will now hold the workload replicating from cluster B.
    • Start an entirely new cluster C and create a main virtual cluster on it from the replication of cluster B. This will start an initial scan because cluster C is empty.

See also


Yes No
On this page

Yes No