Disaster Recovery Overview

On this page Carat arrow pointing down

Resilient deployments aim for continuity in database operations to protect from data loss and downtime. CockroachDB's built-in Raft replication and fault tolerance provide high availability. However, it is still important to design a disaster recovery plan to recover from unforeseen incidents to minimize downtime and data loss.

As you evaluate CockroachDB's disaster recovery features, consider your organization's requirements for the amount of tolerable data loss and the acceptable length of time to recover.

  • Recovery Point Objective (RPO): The maximum amount of data loss (measured by time) that an organization can tolerate.
  • Recovery Time Objective (RTO): The maximum length of time it should take to restore normal operations following an outage.

When you use backups, RPO and RTO can be visualized as follows:

Simulating RPO and RTO. With RPO representing the tolerable data loss, and RTO representing the tolerable time to recovery.

Note:

For an overview of resiliency features in CockroachDB, refer to Data Resilience.

Choose a disaster recovery strategy

CockroachDB is designed to recover automatically; however, building backups or physical cluster replication into your disaster recovery planning protects against unforeseen incidents.

Point-in-time backup & restore Physical cluster replication (asynchronous)
RPO >=5 minutes 10s of seconds
RTO Minutes to hours, depending on data size and number of nodes Seconds to minutes, depending on cluster size, and time of failover
Write latency No impact No impact
Recovery Manual restore Manual failover
Fault tolerance Not applicable Zero RPO node, availability zone within a cluster, region failures with loss up to RPO in a two-region (or two-datacenter) setup
Minimum regions to achieve fault tolerance 1 2

See also


Yes No
On this page

Yes No