Distributed Glossary

RTO vs RPO

RTO vs RPO

RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are two critical metrics in disaster recovery and business continuity planning. They help you answer two essential questions when an outage or disaster strikes:

  1. How quickly do we need to recover? (RTO)

  2. How much data can we afford to lose? (RPO)

Understanding these objectives allows businesses to design and implement a robust strategy that balances cost, complexity, and risk tolerance. Below, we’ll explore what RTO and RPO mean, how they differ, their role in disaster recovery, and best practices for meeting these objectives. After that, we’ll address frequently asked questions in the FAQ section.


What is RTO?

Recovery Time Objective (RTO) is the maximum acceptable downtime for your systems after an outage or disaster. If operations do not resume within the RTO, the business may experience unacceptable costs, lost revenue, or reputational damage.

Key Characteristics of RTO

  • Focus on Downtime: RTO measures how quickly you must restore services after a disruption.

  • Business-Driven: RTO typically aligns with financial or operational thresholds. For instance, an online retailer might have an RTO of one hour for its ordering system, because every hour of downtime equates to lost sales.

  • Measured in Time Units: RTO is usually expressed in hours or minutes, though some mission-critical systems push it down to seconds.

For example, if your customer database fails at 2:00 PM and your RTO is 2 hours, your goal is to have the system up and running again by 4:00 PM. Missing this deadline means you are out of compliance with your own recovery objective, which comes with potential financial or reputational consequences. In fact, downtime can be extremely costly, with some businesses losing up to $427 per minute for smaller operations and significantly more for larger enterprises. On an hourly basis, industries like healthcare and retail face costs of $636,000 and $1.1 million, respectively. Moreover, companies can lose substantial revenue, with some experiencing losses between $100,000 and over $1 million per outage. Therefore, meeting RTOs is crucial to minimize these financial impacts.

Balancing Cost and Speed

Shorter RTOs typically require redundant systems, automation, and higher availability architectures. While this can minimize the damage of downtime, it also often increases costs. Finding a balance involves:

  • Weighing the cost of downtime (lost transactions, customer dissatisfaction).

  • Evaluating the cost of continuous availability (hot standbys, distributed systems, high-speed network connections).

Many modern systems, such as distributed SQL databases (e.g., CockroachDB), help reduce RTO by automatically re-routing traffic if a node or region goes offline, thereby shortening downtime.


What is RPO?

Recovery Point Objective (RPO) is the maximum acceptable amount of data loss when systems are restored. It defines how far back in time your recovery process can “roll back” data to a consistent state.

Key Characteristics of RPO

  • Focus on Data Loss: RPO addresses how much recent data might be irretrievable following an incident.

  • Driving Backup Frequency: RPO influences how often data backups or replications must occur to ensure that any lost data falls within acceptable limits.

For example, if your RPO is 15 minutes, you have determined the business can tolerate losing up to 15 minutes of recent transactions. To achieve this, you might rely on near-real-time replication or frequent incremental backups. If a disaster strikes at 3:00 PM, you can confidently restore from a snapshot taken at 2:45 PM, with only 15 minutes of potential data loss.

Zero RPO

A zero RPO means no data loss is acceptable. Often, this requires synchronous replication to multiple sites, ensuring every transaction is committed across multiple locations before it’s considered final. Distributed databases can get close to zero RPO by writing data to multiple replicas, minimizing the risk of data loss.

Balancing Cost and Downtime  

While a short Recovery Time Objective (RTO) helps minimize downtime—and the associated revenue loss or reputational damage—it often demands more robust infrastructure, specialized failover mechanisms, and a skilled IT response team. For example:

  • Infrastructure Costs: Maintaining hot standby systems, redundant servers, and automated failover solutions can be expensive, but these investments significantly reduce potential downtime.

  • Complexity vs. Speed: Highly available architectures (e.g., distributed databases or clustered applications) may increase operational complexity, yet they can restore services within minutes—or even seconds—when a primary node fails.

  • Opportunity Costs: Some organizations find that a marginal improvement (e.g., reducing RTO from 30 minutes to near-zero) requires disproportionately higher spending. Weigh this extra cost against the potential revenue loss or compliance penalties incurred if you exceed your RTO.

  • Risk Mitigation: If your business model depends on continuous operations (e.g., online retail, financial services), the risk of extended outages may far outweigh the cost of additional infrastructure. Conversely, for non-mission-critical workloads, a slightly longer RTO might be acceptable given a lower budget.

Striking the right balance requires collaborating with both technical stakeholders and business leaders to determine how much downtime is tolerable — and what the organization is willing to spend to prevent it.


Difference between RPO and RTO: Explained

Interplay Between the Two

It’s possible to meet one objective but not the other. For example, you could restore a system quickly (meeting RTO) but revert to a backup from hours earlier (failing your RPO). Conversely, you might restore to a perfectly up-to-date data state (RPO met) but require several hours of downtime (RTO not met).

Organizations typically set RTO and RPO targets simultaneously based on risk tolerance, budget, and operational needs

For many business-critical applications, both low RTO and low RPO (i.e., fast recovery and minimal data loss) are required. This can drive adoption of high-availability architectures and continuous data protection solutions.


RPO and RTO in Disaster Recovery and Business Continuity

Visit the Webinar: “The Always-on Dilemma: Disaster Recovery vs. Inherent Resilience

Peter Mattis, CTO and co-founder at Cockroach Labs joins Rob Reid, Technical Evangelist, to discuss ways to build inherently resilient systems coupled with defense in depth strategies.  

Disaster Recovery (DR) focuses on the technical aspects of restoring IT systems after a disruption, while Business Continuity (BC) considers the wider organizational response to maintain essential functions. RPO and RTO are cornerstones in both realms:

  1. Defining Tolerable Disruption

    • RTO sets the maximum downtime for critical operations before severe damage occurs (e.g., lost revenue, regulatory penalties).

    • RPO sets the maximum data loss your organization can handle without damaging credibility or violating compliance rules.

  2. Influencing Technology Choices

    • If your RPO is near-zero, you might opt for real-time replication and distributed databases that write data to multiple nodes.

    • If your RTO is extremely short, you may implement auto-failover, hot standbys, or container-based microservices that can redeploy quickly.

  3. Supporting Compliance

    • Regulatory standards in sectors like finance and healthcare often mandate certain recovery capabilities.

    • Internal SLAs or external contracts can specify RTO and RPO targets as measurable obligations.

  4. Driving Investment

    • The lower (stricter) your RTO and RPO, the more you typically invest in infrastructure, replication, offsite backups, and skilled personnel.

    • Some businesses opt for multi-region or multi-cloud setups to ensure resilience. For instance, CockroachDB’s multi-region capabilities allow data to be replicated closer to users and automatically fail over.

Real-World Example: E-commerce Platform

  • RTO: 30 minutes. Beyond this point, lost sales and frustrated customers become unacceptable.

  • RPO: 5 minutes. Because orders, payments, and customer data are vital, losing more than five minutes of transactional data could cause confusion over inventory, financial records, and shipping addresses.

To achieve these goals, the company might employ a distributed SQL database for near-real-time data replication and maintain hot standby servers  that can take over almost instantly. The investment in servers, networking, and automation is offset by the reduction in lost sales if a primary server goes down.


Related

Architect your zero downtime strategy with CockroachDB: The Definitive Guide.


Best Practices for Optimizing RPO and RTO

Achieving minimal data loss and fast recovery requires a combination of smart planning, technical solutions, and ongoing maintenance. Below are proven best practices to help your business optimize RPO and RTO.

1. Conduct a Business Impact Analysis (BIA)

A BIA identifies critical business processes, quantifies potential losses, and determines acceptable downtime. This helps you:

  • Prioritize Systems: Not all applications are equally critical.

  • Set Realistic Targets: Define RTO and RPO values based on the real-world impact of downtime or data loss.

2. Tier Your Applications

Categorize applications into tiers. For example:

  • Tier 1: Mission-critical systems (e.g., customer-facing payments).

  • Tier 2: Important systems (e.g., internal reporting, HR).

  • Tier 3: Non-critical or infrequently used systems.

Assign stricter RTO and RPO to Tier 1 applications while allowing more relaxed targets for lower tiers. This approach ensures you allocate resources appropriately rather than over-investing across the board.

3. Choose the Right Data Protection Strategy

Data backup and replication strategies must align with your RPO:

  • Frequent Snapshots or Continuous Backups: If you require a very low RPO, schedule backups frequently. 

  • Cloud and Offsite Storage: Keep backups offsite or in the cloud for protection against local disasters.

  • Encryption and Integrity Checks: Ensure backups are both secure and verified to prevent restoration failures.

4. Leverage High Availability (HA) Architectures

To minimize RTO, invest in resilient architectures that reduce or eliminate downtime:

  • Distributed Systems: A multi-region, distributed SQL database can automatically redirect queries if one node fails, often delivering near-continuous availability.

  • Clustering and Failover: Automated failover to a hot or warm standby can restore operations within seconds or minutes.

  • Containerization and Orchestration: Tools like Kubernetes simplify redeploying services on healthy nodes, cutting recovery times.

5. Automate and Document Recovery Procedures

Manual processes can slow you down during an emergency:

  • Automated Failover Scripts: Trigger an immediate failover to a standby instance.

  • Infrastructure-as-Code: Provision new servers quickly using scripted configurations.

  • Comprehensive Documentation: Keep runbooks current and store them in an accessible, fail-safe location.

6. Test Regularly

Testing is an essential part of an enterprise discovery recovery plan to confirm you can meet your stated objectives:

  • Disaster Recovery Drills: Simulate real outages or data loss scenarios.

  • Restore from Backups: Verify backups are valid and that your recovery time meets RTO.

  • Measure and Refine: Compare actual recovery metrics with your targets, then adjust resources or processes as needed.

7. Continuously Review and Update

Business requirements, infrastructure, and threats can change:

  • Track System Growth: Larger data volumes may slow backup procedures.

  • Adopt New Technologies: For instance, if a new distributed storage service improves replication speeds, you might achieve tighter RPO.

  • Revisit BIA: Ensure your objectives still align with new business lines, regulations, or market demands.

By following these best practices, your organization can systematically reduce downtime (RTO) and limit data loss (RPO), giving you a resilient foundation that supports continuous operations—even under adverse conditions.


RTO vs. RPO FAQ

Below are common questions related to RTO, RPO, and their optimization.

How do we determine the right RTO and RPO for our business?

Start with a Business Impact Analysis (BIA) to understand how downtime or data loss affects each department. Quantify potential revenue impact and compliance risks. Engage both technical teams and business stakeholders to decide on maximum tolerable downtime (RTO) and data loss (RPO). Balance the cost of downtime/data loss against the cost of building more resilient systems.

Can we achieve zero RTO or zero RPO?

  • Zero RTO means no perceived downtime at all. In practice, you’d need highly available architectures with automatic failover to ensure continuous service.

  • Zero RPO means no data loss whatsoever. This typically requires synchronous replication to multiple locations so that every write is confirmed in real time. Modern distributed SQL databases (like CockroachDB) approach near-zero RPO and RTO due to our architecture. Most businesses aim for low or near-zero metrics where justified.

How can we reduce our RTO and RPO?

  • For RTO (faster recovery): Use automated failover, maintain standbys, and script your recovery processes. 

  • For RPO (less data loss): Increase backup frequency or adopt continuous replication to a standby. If your backups currently happen once a day, consider hourly or near-real-time snapshots.

Improving network bandwidth, verifying backup integrity, and ensuring offsite replication are additional ways to tighten both RTO and RPO.

Why is RTO testing and RPO testing important, and how often should we test?

RTO and RPO testing is crucial because theoretical objectives might not hold up under real conditions. By regularly testing:

  • You verify backups and failover processes work as intended.

  • You uncover hidden dependencies or bottlenecks.

  • Your team gains familiarity with emergency procedures, reducing mistakes during an actual outage.

Annual testing is common, though critical systems might warrant semi-annual or quarterly drills. Always reassess your results to see if they meet your RTO and RPO targets.

How do RTO and RPO relate to our SLA or compliance obligations?

Some regulations require specific recovery capabilities (e.g., within X hours for financial records). You may incorporate RTO and RPO into internal SLAs to set performance benchmarks for your IT department. Externally, customers might demand certain uptime or data protection guarantees. Documenting (and meeting) your RTO/RPO objectives helps maintain compliance, avoid fines, and build customer trust.

Should we tier our services for different RTO and RPO objectives?

Yes. It’s common to tier services based on criticality:

  • Tier 1: Minimal downtime and data loss (e.g., 15-minute RTO, near-zero RPO).

  • Tier 2: Moderate tolerance (e.g., 2–4 hours RTO, 1-hour RPO).

  • Tier 3: Extended downtime acceptable (e.g., 24 hours RTO, 12-hour RPO).

This ensures you invest heavily where it matters most. Testing each tier separately also helps refine your overall continuity plan.

Where should backups be stored to meet a good RPO?

Store backups offsite or in geographically separated cloud regions. On-premises backups alone may fail in regional disasters like floods or fires. Using a distributed backup service or multi-cloud approach further protects data. Regularly confirm that backups are valid and restorable, since corrupt or incomplete backups won’t meet any RPO goals.

Can distributed databases help with both RTO and RPO?

Yes. Distributed SQL solutions—such as CockroachDB—replicate data across multiple nodes (and often multiple regions). This approach:

  • Minimizes downtime (improving RTO) by automatically failing over if one node goes offline.

  • Reduces data loss (improving RPO) by maintaining near-real-time replicas that keep data current.

Such architectures are particularly valuable for organizations with global user bases or strict uptime/data integrity requirements.

More about RTO vs RPO

Cloud Migration

What is cloud migration?

Cloud migration is the process of moving an organization's digital assets, services, databases, IT resources, and applications into the cloud. This process may involve transferring data and applications from an on-premises data center to a cloud-based infrastructure or shifting resources from one cloud environment to another. With Gartner predicting that “Worldwide end-user spending on public cloud services [will] total $723.4 billion in 2025, up from $595.7 billion in 2024,” it’s important for all companies to keep in mind where they are in their cloud journey, as that is where modern business continues to head. This article will discuss the benefits and challenges of moving from fully on-prem to hybrid or cloud deployments, as well as the benefits of multi-cloud and managing cross-cloud migration. 

Particularly as companies aim to stay competitive in today’s economic landscape, many are turning to modernization efforts, which often includes migrating data from purely on-prem data centers to a hybrid, multi-region, or multi-cloud deployment.

One of the more common strategies is called rehosting, also known as “lift and shift.” This involves moving applications, workloads, and data from on-premises or another cloud environment to the cloud with minimal or no changes. This strategy is often used for quick cloud adoption and is usually the fastest and simplest method of cloud migration. Other common strategies include replatforming – lifting and then tinkering to get the shift to work more seamlessly – refactoring (rearchitecting) applications for the cloud, repurchasing (transitioning products entirely), or retiring. 

Benefits of cloud migration

  1. Cost Savings: Cloud migration can help reduce IT costs by minimizing the need for physical hardware and maintenance while optimizing resource usage based on demand.

  2. Scalability and Flexibility: It allows organizations to scale resources up or down easily according to their needs, enhancing their ability to respond to changing market conditions and business demands.

  3. Performance and Reliability: Cloud providers offer high availability and disaster recovery options, ensuring better performance and reliability for applications and services.

  4. Access to Advanced Technologies: Migrating to the cloud enables organizations to leverage cutting-edge technologies like artificial intelligence, machine learning, and big data analytics, promoting innovation and competitive advantage.

  5. Global Reach: Cloud services support deployment across multiple geographic regions, enabling organizations to deliver services closer to their customers, thus reducing latency and improving user experience.

  6. Supporting Company Initiatives: Many companies are focused on modernization efforts, and today, that includes moving to the cloud.

Common challenges of cloud migration

While there are certainly many benefits to cloud migration, there also can be some challenges:

  1. Downtime and Service Disruption: Migrating services and applications can lead to downtime, which can cause service disruptions and impact business operations.

  2. Data Security and Compliance: Ensuring the security and compliance of data in transit and at rest during and after the migration is critical. Adhering to regulatory requirements and industry standards needs proper planning and execution.

  3. Cost Management: Unexpected costs can arise during and after migration due to poor planning, oversized or underutilized resources, and lack of cost optimization strategies.

  4. Technical Compatibility: Ensuring that existing applications and systems are compatible with the cloud infrastructure can be challenging. Some legacy applications may require significant modifications to migrate successfully.

  5. Performance Issues: Post-migration performance may suffer due to differences in cloud infrastructure, network latency, or misconfiguration, impacting the user experience.

  6. Data Loss and Integrity: Ensuring data accuracy and preventing data loss during migration can also be a key concern. Proper data backup and validation processes are necessary to mitigate these risks.

  7. Complexity of Cloud Architecture: Understanding and effectively managing the complexity of cloud architecture, such as hybrid or multi-cloud environments, can be challenging.

  8. Governance and Management: Establishing proper governance and management practices to monitor, control, and optimize cloud resources is crucial and can be complex.

  9. Staff Expertise and Training: Existing IT teams may lack the required skills and expertise to handle cloud environments and migration processes. Investing in training and hiring skilled professionals is essential.

Addressing these challenges requires careful planning, a well-thought-out migration strategy, strong project management, and collaboration between business and IT teams. To simplify cloud migrations, companies can turn to different tools provided by database companies, who focus on ensuring a smooth transition.

What is cross-cloud migration?

Cross-cloud migration, also known as multi-cloud migration, refers to the process of moving data, applications, and workloads between different cloud service providers or utilizing multiple cloud environments to run various parts of an organization's IT landscape. This can involve migrating from one public cloud to another (e.g., from AWS to Azure), leveraging both public and private cloud resources, or distributing workloads across several public cloud providers.

Organizations undertake cross-cloud migration for several reasons:

  1. Avoiding Vendor Lock-in: By using multiple cloud providers, organizations can avoid dependency on a single vendor. This flexibility allows them to switch providers more easily if necessary, without being restricted by contractual or technical limitations.

  2. Cost Optimization: Different cloud providers offer varied pricing models, services, and performance characteristics. Organizations can optimize costs by selecting the most cost-effective services for specific workloads or taking advantage of promotional pricing and discounts from multiple providers.

  3. Performance and Availability: Using multiple cloud environments can enhance disaster recovery and business continuity. Distributing workloads across various clouds can ensure higher availability and redundancy, reducing the risk of downtime and data loss.

  4. Geographic Distribution: Organizations with a global presence may use multiple cloud providers to ensure that their services are geographically distributed closer to their users, reducing latency and improving performance.

  5. Compliance and Data Sovereignty: Different cloud providers may meet specific regulatory and compliance requirements in various regions. Cross-cloud strategies allow organizations to store and process data in specific jurisdictions to comply with local laws and data sovereignty rules.

  6. Enhanced Innovation and Functionality: Each cloud provider offers unique tools, services, and innovations. By leveraging multiple providers, organizations can take advantage of a broader range of features and capabilities to better meet their business needs and drive innovation.

  7. Risk Mitigation: Relying on a single cloud provider can pose risks if that provider experiences outages, security breaches, or other issues. Cross-cloud strategies can mitigate risk by ensuring that the failure of one provider does not disrupt the entire operation.

Overall, cross-cloud migration enables organizations to build a more flexible, resilient, and cost-effective IT infrastructure that can adapt to changing business needs and market conditions. However, it also introduces additional complexity in managing and integrating multiple cloud environments, which requires careful planning and robust management practices.

Cross-cloud migration with CockroachDB

CockroachDB offers flexible replication controls that make it easy to run a single CockroachDB cluster across multiple cloud platforms and migrate data from one cloud to another without service interruption. This process involves starting nodes on different clouds, and using load balancers like HAProxy to ensure even distribution of client requests.

The migration process includes steps such as initializing the cluster, setting up load balancing, running sample workloads, and watching data balance across nodes. Eventually, you can migrate all data to a new cloud by adding constraints to ensure all replicas are on the desired cloud nodes, or simply decommissioning nodes in the source cloud. This is called a stretch migration.

Check out the video below, where Technical Evangelist, Rob Reid, performs a stretch migration:

What are cloud migration services?

Moving data, applications, and services to the cloud or between clouds can be a daunting task. As outlined above, there are many benefits, but also substantial challenges to consider when adopting a cloud migration plan. Cloud migration services help organizations move data, applications, and other business elements from an on-premises environment or one cloud provider to another cloud environment. The three major cloud service providers are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

AWS Database Migration Service (DMS) is a popular tool used to migrate data from one database to another. Easy to use, AWS DMS offers minimal downtime, support for various database engines, flexibility, scalability, cost-effectiveness, automatic monitoring, security, integration with the AWS ecosystem, and overall reliability. Its ability to simplify and streamline the database migration process while providing robust performance and security makes it an attractive option for organizations looking to modernize their database infrastructure and take advantage of cloud benefits. 

For example, users can use AWS DMS to migrate from PostgreSQL, MySQL, Oracle, or Microsoft SQL Server to a distributed SQL database, like CockroachDB. The process involves setting up a replication instance, configuring source and target endpoints, and creating a database migration task.

Distributed SQL databases are uniquely powerful in that they provide the much-needed consistency and familiar structure of traditional, relational databases along with the scalability, survivability, and performance of NoSQL databases, necessary for modern businesses.

The MOLT (Migrate Off Legacy Technology) Suite

MOLT stands for "Migrate Off Legacy Technology." It is a suite of tools designed by Cockroach Labs to facilitate the database migration process to CockroachDB (self-hosted or in the cloud). There are three tools to support users throughout the migration process.

  • MOLT SCT (Schema Conversion Tool): Converts database schemas from PostgreSQL, MySQL, Oracle, and SQL Server to a CockroachDB-compatible schema.

  • MOLT Fetch: Moves data from a source database (PostgreSQL, MySQL, or CockroachDB) to CockroachDB. It supports initial data loads and continuous replication.

  • MOLT Verify: Checks for data discrepancies between the source and target databases to ensure data integrity during migration.

These tools aim to simplify and streamline the migration process, enabling organizations to modernize their infrastructure and move away from legacy systems.

We hope you've enjoyed this overview of cloud migration! Happy modernizing!

Check out this demo of the MOLT Suite at RoachFest, our customer conference:

Cloud Migration FAQ

What is cloud migration?

Cloud migration is the process of moving an organization's digital assets, services, databases, IT resources, and applications into the cloud. This may involve transferring data and applications from an on-premises data center to a cloud-based infrastructure or shifting resources from one cloud environment to another.

What are the top cloud service providers?

Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

What are the benefits of cloud migration?

Cloud migration reduces IT costs and provides scalability, flexibility, and high reliability for applications and services while enabling the use of advanced technologies like AI and big data analytics. It supports global reach, enhancing user experience and modernization efforts.

What are the common challenges of cloud migration?

Without careful planning, cloud migration can cause downtime and service disruptions. Clear understanding of the nuances of cloud migration are needed to ensure data security, technical compatibility, and proper cost management. Navigating the complexity of cloud architecture and ensuring staff expertise also pose significant challenges. There is no one-size-fits-all solution, and so attention must be paid to the specificities of each ecosystem.

What is cross-cloud migration?

Cross-cloud migration, also known as multi-cloud migration, refers to the process of moving data, applications, and workloads between different cloud service providers or utilizing multiple cloud environments to run various parts of an organization's IT landscape.

What are cloud migration services?

Cloud migration services involve moving data, applications, and other business elements from an on-premises environment or one cloud provider to another cloud environment.

What are common cloud migration services?

AWS DMS (Data Migration Services) are a popular tool for organizations to utilize as they gain access to the AWS ecosystem. Database companies, like CockroachDB, also often offer their own migration services. CockroachDB provides a set of tools to support every step of the data migration process (both for self-hosted and our managed Cloud offerings): MOLT SCT, MOLT Fetch, and MOLT Verify.

How do I get started with cloud migration?

You can check out comprehensive resources online from database providers, such as The Architect's Guide to SQL Database Modernization: Your Step-by-Step Roadmap, which provides guidelines on how to navigate the entire modernization journey, from on-prem to hybrid to multi-region and multi-cloud.

More about Cloud Migration

DSQL

What is DSQL?

Published on March 14, 2025

DSQL can refer to either the proprietary AWS Aurora DSQL database or to distributed SQL databases – a class of databases designed to provide horizontal scalability, high availability, and strong consistency while maintaining the relational database capabilities that developers expect. Unlike traditional SQL databases, which often require complex sharding or replication strategies to scale, distributed SQL databases natively distribute data across multiple nodes and regions, allowing applications to handle increasing workloads without sacrificing performance.

Check out the webinar below for an introduction to distributed SQL by technical evangelist, Rob Reid:

What is Amazon Aurora DSQL?

Amazon Aurora DSQL is a serverless distributed SQL database introduced by AWS in late 2024. It is designed to offer scalability, strong consistency, and high availability while eliminating infrastructure management. Aurora DSQL extends the Amazon Aurora family by providing an active-active multi-region architecture with automatic sharding, replication, and failover capabilities.

Key Features of Aurora DSQL

  • Serverless & Fully Managed – No need for manual provisioning, scaling, or maintenance.

  • Multi-Region Active-Active Availability – Supports synchronous replication across multiple AWS regions high availability guarantees

  • Optimistic Concurrency Control (OCC) – Handles transactions using OCC and snapshot isolation, which may lead to higher transaction abort rates under high contention workloads

  • Automatic Sharding & Scale-Out Performance – Aurora DSQL automatically distributes data across nodes, scaling for both reads and writes

Aurora DSQL belongs to the distributed SQL class of databases, and it can be compared to offerings like Google Cloud Spanner, Apache Cassandra, and CockroachDB. All of these systems address the challenge of scaling databases horizontally while maintaining consistency, but they differ in design decisions, consistency models, performance characteristics, and pricing.

Aurora DSQL: Key Limitations

AWS announced the public preview of Aurora DSQL at AWS re:Invent 2024, and as of March 2025, Aurora DSQL is still in preview. In the preview, Aurora DSQL is available in limited regions (e.g., three US regions) and is free to try. AWS’ investment in distributed SQL is particularly exciting to companies like Cockroach Labs, which offered the first commercially available distributed SQL database a decade ago.

Because Aurora DSQL is in preview, there are still a number of feature gaps, including a lack of PostgreSQL compatibility, and availability in only a few regions. In addition, the service is not yet battle-tested at the level of the original Aurora or other mature databases. AWS is likely to continue improving Aurora DSQL’s compatibility and expanding its regional availability throughout 2025. If it follows a trajectory similar to past Aurora features, general availability (GA) could arrive by late 2025, possibly with more enterprise features and compliance certifications (at launch it’s only in preview, so not for production use yet).

Overall, Aurora DSQL is an exciting development in cloud databases – bringing the scalability of NoSQL and the power of SQL together. Positioning itself against technologies like Google Cloud Spanner and CockroachDB, DSQL is another option for customers looking for a serverless operation, multi-region design, and aspirational PostgreSQL compatibility. For now (early 2025), it’s a technology to watch and experiment with, especially for those already in the AWS ecosystem or those hitting the limits of single-node databases and considering a distributed SQL alternative.

Aurora DSQL vs. Google Cloud Spanner

Google Cloud Spanner is a distributed SQL database (first described by Google in 2012) that offers global-scale transactions with strong consistency. Both Spanner and Aurora DSQL emphasize horizontal scalability, fault-tolerance, and ACID consistency across regions, but there are notable differences:

  • Architecture & Consistency: Spanner relies on a proprietary TrueTime API (atomic clocks & GPS) to achieve external consistency (linearizable global order of transactions). Aurora DSQL uses AWS’s Time Sync Service for tightly synchronized clocks to support its snapshot isolation model. In practice, both deliver synchronous replication across regions with no data loss, but Spanner’s approach guarantees a single global serialization order for transactions. Aurora DSQL’s approach aims for minimal latency impact and resilience (it requires 3 regions—two active and one witness—for quorum, similar to Spanner’s Paxos-based replication).

  • Performance: AWS has claimed significant performance advantages for Aurora DSQL over Spanner. In fact, Amazon’s CEO announced that Aurora DSQL achieved reads and writes four times faster than Spanner in internal benchmarks. It will be interesting to see the performance comparison play out as DSQL gains more users.

  • Scalability & Use Cases: Both databases can scale horizontally to large sizes. Aurora DSQL can also operate across multiple regions, but gives you more control to link specific regions (it’s currently only in a few AWS regions during preview). Spanner has been used for multi-region consistent deployments, while Aurora DSQL is yet to be truly tested on an enterprise scale.

  • Platform & Compatibility: Spanner is available only on Google Cloud Platform, whereas Aurora DSQL is an AWS service (and currently in limited preview regions on AWS). Aurora DSQL’s PostgreSQL compatibility means it can work with many existing tools and ORMs, whereas Spanner uses its own SQL dialect (though JDBC drivers exist, and Google provides a PostgreSQL-interface for Spanner in a separate product variant). Neither offer compelling options for those seeking multi-cloud deployments in the future.

Aurora DSQL vs. Apache Cassandra

Apache Cassandra is a popular open-source NoSQL database known for its extreme scalability and always-on design. It takes a very different approach from Aurora DSQL, trading some consistency guarantees for partition tolerance and speed. Comparing Cassandra to Aurora DSQL highlights the differences between NoSQL (AP) systems and distributed SQL (CP/ACID) systems:

  • Data Model and Features: Cassandra is a NoSQL wide-column store, whereas Aurora DSQL is a relational SQL database. Aurora DSQL supports SQL queries, joins, secondary indexes, and ACID multi-row transactions. As of version 5.0, Cassandra now also supports secondary indexes and ACID transactions.

  • Consistency Model: Cassandra clients can tune consistency per query (e.g., reads or writes), and as of 5.0 offers ACID transactions. Aurora DSQL is designed for strong consistency – a committed transaction’s changes are visible immediately to all subsequent reads cluster-wide. Essentially, Aurora DSQL claims immediate consistency and full ACID semantics, whereas Cassandra follows an “eventually consistent” model that sacrifices some correctness for availability and performance.

  • Performance and Scalability: Both systems scale horizontally, but Cassandra’s scaling is “shared-nothing” and proven in some of the largest data workloads in the world. Aurora DSQL’s scaling is impressive in that it handles complex SQL operations while scaling, but being newer, real-world benchmarks will further reveal its performance at extreme scale. Cassandra can ingest massive write loads and scale out to petabytes of data across dozens or hundreds of nodes. Because it forgoes synchronous cross-node coordination on writes (unless configured otherwise), it can offer lower write latency and higher throughput in some scenarios.

  • Use Cases & Flexibility: Use cases that suit Apache Cassandra include scenarios where you need to handle huge volumes of data with simple queries, and can tolerate eventual consistency – for example, IoT data ingestion, analytics where slight delays are okay, messaging logs, or content feed storage. If you need to perform complex queries (multi-table joins, aggregations, etc.) and rely on transactional consistency, a distributed SQL system like Aurora DSQL or CockroachDB could be more appropriate.

Aurora DSQL vs. CockroachDB

CockroachDB was the first commercially available distributed SQL database, known for its performance, high availability, scalability, and resilience. CockroachDB and Aurora DSQL share goals of combining SQL features with horizontal scale, but they differ in maturity and deployment models:

  • Deployment and Flexibility: CockroachDB can be deployed anywhere – on-premises, on any cloud, or via Cockroach Labs’ managed service. It supports hybrid and multi-cloud deployments, giving companies cloud flexibility and even the ability to run across cloud providers. Aurora DSQL, on the other hand, is a proprietary AWS service only, offered as a serverless managed database on AWS.

  • Consistency and Transactions: Both systems ensure strong consistency for transactions. A key difference is the transaction isolation level: CockroachDB runs all transactions with serializability by default, which is the strictest isolation level, ensuring no anomalies or write skew issues occur. Aurora DSQL implements repeatable-read snapshot isolation (optimistic concurrency control), which allows for high concurrency and performance, but under higher contention may result in transaction rollbacks for conflicts. For workloads with lots of contention or hot spots, CockroachDB’s approach avoids anomalies and it naturally supports features like foreign keys and integrity constraints across the cluster. Aurora DSQL in preview does not yet support certain PostgreSQL features like foreign keys, triggers, or views, whereas CockroachDB has long supported foreign keys, JSON data, and many Postgres-compatible features. This means that today CockroachDB will handle complex relational schemas more completely, while Aurora DSQL’s focus has been on core transactional performance and will likely add additional features over time.

  • Performance and Scalability: Both databases are built to scale out on commodity hardware and handle heavy loads. CockroachDB has proven performance in production deployments, executing massive numbers of transactions with low latency and no manual sharding – it automatically balances data across nodes. It maintains indexes and constraints cluster-wide, so applications always see consistent, up-to-date data. One area of distinction is latency and locality: CockroachDB allows configuring data locality, and it automatically routes queries to nearest data copies. As of its preview, Aurora DSQL does not yet offer granular data locality controls to, say, pin a table to one region – it treats the entire cluster as a single logical database that is replicated. This could change in the future, but currently CockroachDB provides more control and has been successfully adopted in enterprises around the world.


RELATED: Watch our webinar on-demand, “CockroachDB vs. Amazon Aurora: Comparing Distributed Relational Databases.”


If an organization values total cost of ownership and freedom from vendor lock-in, they might lean toward CockroachDB; if they value hands-off management and deep AWS integration, Aurora DSQL is attractive. It’s also worth noting that CockroachDB, with multi-cloud capability, can help avoid cloud concentration risk, whereas Aurora DSQL ties your data layer to AWS.

Battle-Tested vs. Beta: Aurora DSQL vs. CockroachDB

While both Aurora DSQL and CockroachDB are distributed SQL databases, CockroachDB offers superior transactional consistency, PostgreSQL compatibility, operational flexibility, multi-region, and multi-cloud capabilities. Since CockroachDB has been commercially available for nearly a decade, the team has been innovating, pioneering, and grappling with the distributed SQL technology for significantly longer. The maturity of CockroachDB is a testament to the engineers behind the technology. The table below highlights key differences:

When to Choose CockroachDB Over Aurora DSQL

  • For business-critical workloads: CockroachDB offers stronger consistency (Serializable isolation), ensuring better reliability under concurrent transactions.

  • For multi-cloud or hybrid deployments: Aurora DSQL is AWS-exclusive, while CockroachDB runs across all cloud providers and on-prem.

  • For PostgreSQL compatibility: CockroachDB supports features like foreign keys, triggers, stored procedures, and sequences, while Aurora DSQL currently does not.

  • For predictable pricing and control: CockroachDB offers flexible pricing models and deployment options (serverless, dedicated, self-hosted, and pay-as-you-go), whereas Aurora DSQL, while currently in a free preview, provides no clarity as to how the pricing could change over time.

Aurora DSQL and CockroachDB both provide scalable, distributed SQL solutions, but CockroachDB stands out for enterprise-grade consistency, PostgreSQL compatibility, and deployment flexibility. Additionally, CockroachDB is battle-tested. Organizations looking for a multi-cloud, high-performance transactional database with rich SQL support will find CockroachDB to be a superior choice over Aurora DSQL.

Get started with CockroachDB Cloud today. We’re offering $400 of free credits to help kickstart your CockroachDB journey, or get in touch today to learn more.

DSQL FAQ

What is DSQL (Distributed SQL)?

DSQL can refer to either the proprietary database, AWS Aurora DSQL, or to a class of databases designed to provide horizontal scalability, high availability, and strong consistency while maintaining the relational database capabilities that developers expect.

What is distributed SQL?

Distributed SQL databases are a class of databases designed to provide horizontal scalability, high availability, and strong consistency while maintaining the relational database capabilities that developers expect.

What is Aurora DSQL?

Announced at AWS re:Invent 2024, Aurora DSQL belongs to the class of distributed SQL databases, and it can be compared to offerings like Google Cloud Spanner, Apache Cassandra, and CockroachDB. All these systems address the challenge of scaling databases horizontally while maintaining consistency, but they differ in design decisions, consistency models, performance characteristics, and pricing.

What are the key features of DSQL?

At the time of writing, DSQL is serverless and fully managed, offers multi-region Active-Active availability, optimistic Concurrency Control (OCC), and automatic sharding. Since Aurora DSQL is in preview, we anticipate changes to be forthcoming.

What are the most commonly used distributed SQL databases?

Today, the most commonly used distributed SQL databases include Google Spanner and CockroachDB. CockroachDB was the first commercially available distributed SQL database.

How does CockroachDB compare with Aurora DSQL?

While both Aurora DSQL and CockroachDB are distributed SQL databases, CockroachDB offers superior transactional consistency, PostgreSQL compatibility, operational flexibility, and multi-cloud capabilities. Since CockroachDB has been commercially available for nearly a decade, the team has been innovating, pioneering, and grappling with the distributed SQL technology for significantly longer.

More about DSQL

Disaster Recovery

What is Disaster Recovery? 

Disaster recovery (also known as DR) is a critical aspect of database management, ensuring that systems can quickly recover from unexpected events that disrupt normal operations, with minimal to no data loss. 

Disaster recovery refers to the strategies and processes that an organization puts in place to restore normal operations after a disruptive event. These events can range from hardware failures and data corruption to security breaches and natural disasters. A strong disaster recovery plan helps to minimize downtime and data loss, which maintains business continuity.

What is a Disaster Recovery Plan?

A disaster recovery plan is a documented, structured approach with instructions for responding to unplanned incidents. It includes:

  • Definition and significance: A disaster recovery plan outlines the steps to recover from various types of disasters, ensuring minimal downtime and data loss.

  • Key strategies and goals: These include minimizing downtime, ensuring data integrity, and maintaining business continuity.

  • Roles and responsibilities: Clearly defined roles for IT staff and service providers are crucial for effective disaster recovery.

  • Critical systems involved: This includes databases, applications, and network infrastructure.

Steps in Disaster Recovery

Recovering from disasters requires detailed steps, depending on the type of failure. A few examples of possible failures include: 

Hardware and network failures 

  • Disk failures: Hard drive crashes, SSD failures, or RAID controller issues

  • Data center outages: power outages or network partitioning impacting access

Natural Disasters

Earthquakes, Floods, Fires: Physical damage to database servers in on-premises data centers

Power Failures: Unexpected shutdowns leading to incomplete transactions or corruption

Human Errors: Accidental Deletion: Unintentional DROP TABLE, DELETE, or UPDATE commands

Security Breaches: Malware or Ransomware Attacks: Encrypting or corrupting database files, making data inaccessible

After determining the kinds of failure that your system could encounter, it’s important to identify and formalize the basis of your disaster recovery plan. For example, what specific actions can there be for short-term and long-term remediation, what tools and support are at your disposal, what are the potential risks and their impact. Then you can build out your recovery objectives and codify your plan for regular testing.

Below, we have further explained each of the above areas.

Specific actions for short-term and long-term remediation 

In disaster recovery, short-term remediation refers to getting all systems operational and then keeping applications running, whereas long-term remediation refers to improving the overall recovery plan. 

Immediate short-term actions might include recovering from your latest backup or failing over to a standby, while long-term actions involve analyzing the cause of the disaster and improving the recovery plan.

In the case of disaster recovery for a database, the distributed SQL database CockroachDB, for example, provides tools like automated backups, monitoring dashboards, and support services to assist in disaster recovery.

Identifying potential risks and their impact

The first step in constructing a reliable DR plan is to conduct a comprehensive risk assessment. Preparing this assessment involves the cataloging of all potential threats that could disrupt normal operations — these may include:

  • Hardware failures 

  • Cyberattacks 

  • Data corruption 

  • Natural disasters (such as floods or earthquakes) 

For distributed SQL databases, this also means identifying risks associated with network instability, regional outages, and replication delays. 

By understanding each threat’s likelihood and severity, organizations can gauge the potential impact on their data, applications, and overall productivity. The result of this exercise should be a prioritized list of vulnerabilities, enabling teams to focus their attention on mitigating the most critical risks and formulating tailored recovery strategies. 

Establishing Recovery Objectives 

Recovery objectives serve as the guiding metrics for measuring the effectiveness of a DR plan. Two primary parameters stand out: 

  1. Recovery Time Objective (RTO) – the maximum acceptable amount of time to restore normal operations after a disaster

  2. Recovery Point Objective (RPO) – the maximum acceptable amount of data loss measured in time

In other words, the RTO defines how quickly systems must be restored after a disruption, while the RPO specifies how much data loss is tolerable to an organization. For example, an enterprise might have an RTO of two hours and an RPO of 15 minutes, meaning that they can tolerate up to two hours of downtime and 15 minutes of data loss. Put another way, a financial services firm might require near-zero data loss and mere minutes of downtime, whereas a smaller e-commerce platform could be more flexible. 

Determining these objectives ensures that teams can choose technologies and architectures — such as CockroachDB optimized for high availability and geo-replication — that align with business needs. CockroachDB's architecture, for example, supports low RTO and RPO through features like automated backups, multi-region replication, and high availability.

Specifying RTO and RPO benchmarks help organizations to set realistic recovery goals, and choose appropriate recovery strategies. These benchmarks also help to set budgetary constraints, guiding investments in infrastructure, automation, and redundancy.

Implementing Backup and Recovery Solutions

Once the risk landscape and objectives are clear, the next step is to deploy suitable backup and restoration mechanisms. In a distributed SQL environment, data is often replicated across multiple nodes or regions, providing built-in resilience. However, true disaster recovery requires a multi-layered approach. This may include:

  • Snapshot-based backups 

  • Point-in-time recovery (PITR) capabilities 

  • Encrypted offsite or cloud-based backups. 

Automation is key to this component: scheduling regular backups and verifying data integrity ensures a smooth and rapid failover process. Some organizations also maintain “warm” or “hot” standby systems, which stand ready to instantly take over operations if the primary environment fails. By combining these tools and technologies, teams can ensure that their data and applications remain accessible, even under extremely challenging circumstances.

Regularly Testing and Updating the Plan

A disaster recovery plan is not a static document — it must evolve with an enterprise’s changing technologies, growth in data volume, and emerging threats. Regularly conducting DR drills and simulations helps validate the effectiveness of each strategy, reveals gaps in processes, and uncovers any hidden dependencies. Teams should test failover procedures, confirm backup integrity, and measure recovery performance against established objectives. 

Equally important, IT teams must keep the disaster recovery plan up-to-date to reflect infrastructure changes, software upgrades, and new compliance mandates. Continuous testing and refinement ensure that when a true disaster strikes, the organization is fully prepared to restore operations quickly, minimize loss, and maintain customer trust.

What is Cloud Disaster Recovery?

Cloud disaster recovery involves using cloud-based resources and services to back up data and applications, ensuring that they can be restored in the event that a disaster takes place. Key aspects include:

Definition and significance: Cloud disaster recovery leverages the scalability and flexibility of cloud services to provide a cost-effective and efficient recovery solution.

Cloud Disaster Recovery Plan: The cloud-native distributed SQL database CockroachDB, for example, offers its users comprehensive disaster recovery features that includes automated backups, multi-region deployments, and high availability features.

What is Disaster Recovery as a Service (DRaaS)? 

Disaster Recovery as a Service (DRaaS) leverages cloud-based solutions to provide organizations with an outsourced, turnkey approach to recovering their IT environments after a disruptive event. Instead of relying solely on internal infrastructure and expertise, businesses leveraging DRaaS partner with a third-party provider which maintains the necessary hardware, software, and processes to restore operations quickly and efficiently.

DRaaS is particularly appealing for organizations that prefer to avoid the complexity and cost of building and managing their own secondary data centers. The DRaaS provider ensures that critical systems, applications, and data are continuously replicated in a secure, offsite environment. In the event of a disaster — whether it’s a cyberattack, hardware failure, or natural calamity — teams can fail over to the DRaaS provider’s resources and continue operating with minimal downtime.

Another important advantage of DRaaS is its scalability and flexibility. As businesses grow or their data protection needs evolve, they can easily adjust their DRaaS solution without significant capital investments. DRaaS vendors also often offer a range of recovery options, which helps organizations to fine-tune their RTOs and RPOs based on their unique requirements and compliance mandates.

With DRaaS, enterprises have a partner that is continuously monitoring their systems, verifying backups, and testing restoration procedures. It’s a proactive approach to ensure that a robust, well-orchestrated response is a failover command away when disaster strikes, significantly reducing financial losses, operational disruptions, and reputational damage.

What is Business Continuity Planning (BCP)?

Business Continuity Planning (BCP) is another aspect of disaster recovery. It extends beyond the immediate recovery of IT systems, taking a holistic approach to maintaining organizational operations during and after disruptive events. 

While disaster recovery focuses on restoring specific infrastructure and data, BCP ensures that the business as a whole can continue functioning — serving customers, meeting regulatory requirements, and protecting brand reputation — regardless of the interruption’s scale or nature.

A comprehensive BCP begins with a detailed analysis of:

  • Critical business functions 

  • Supply chain dependencies 

  • Communication strategies 

This process involves identifying the technology required for seamless operations, as well as the workforce, facilities, and external partners essential for day-to-day activities. For example, BCP might entail contingency plans for shifting workforces to alternate locations, rerouting logistics, or enabling remote operations if key offices become inaccessible.

Regular training and simulation exercises are fundamental to effective BCP. These drills verify that stakeholders understand their roles, communication lines remain open, and everyone can adapt quickly if a crisis occurs. Continuous refinement is equally important: The plan must be revisited and updated as the business evolves, with steps like integrating new software, diversifying suppliers, or introducing hybrid work models.

Ultimately, BCP ensures organizations are ready to respond swiftly and cohesively to any disruption – safeguarding both their IT systems and their overall resilience. By focusing on maintaining core business functions rather than just technology recovery, an organization’s BCP can ensure that operations remain stable, customers are supported, and reputations stay intact, no matter what challenges they face.

What is IT Disaster Recovery?

IT Disaster Recovery (ITDR) is focused on restoring the technical core of an organization’s infrastructure after a catastrophic event. While BCP and DRaaS provide broader frameworks, ITDR is for specific tasks: rebuilding servers, retrieving critical data, reinstating network connectivity, and ensuring that key applications return to their pre-disaster performance levels.

A solid ITDR strategy often starts with a detailed inventory of IT assets — including servers, storage arrays, network devices, and software platforms — coupled with an understanding of their criticality. By prioritizing systems according to their importance, teams can allocate resources wisely and focus their recovery efforts where they matter most. Techniques like virtualization, multi-region replication, and snapshot-based backups are commonly employed to ensure rapid, reliable restoration.

Consistent testing is central to ITDR. Regular drills, failover simulations, and integrity checks help validate the effectiveness of backup processes, highlight gaps in procedures, and confirm that recovery steps can be performed smoothly under pressure. As technology evolves, ITDR plans must be updated to include new tools, platforms, and security protocols, ensuring ongoing alignment with an organization’s environment and risk profile.

ITDR’s primary goal is to minimize downtime and data loss at the technical layer. By maintaining robust backups, establishing clear restoration workflows, and continuously refining ITDR processes, organizations can ensure that their technology stack remains reliable, resilient, and ready to recover when the unexpected strikes. This lays the groundwork for overall organizational stability and long-term success.

The Importance of a Disaster Recovery Plan

Having a robust disaster recovery plan is essential for maintaining business continuity and minimizing the impact of unexpected events. When organizations regularly update and test their plan, this ensures that it remains effective and can be relied upon when needed. Implementing a comprehensive disaster recovery strategy with a distributed SQL database like CockroachDB can help organizations to achieve their recovery objectives and maintain high availability.

Explore CockroachDB for Disaster Recovery

CockroachDB offers robust disaster recovery capabilities essential for modern database management. With features like Raft replication and fault tolerance, CockroachDB ensures high availability and automatic recovery from disruptions, including hardware failures and data corruption. Its built-in resilience minimizes downtime and data loss, maintaining business continuity.

Get started with CockroachDB Cloud for free today!


RELATED

Architect your zero downtime strategy with CockroachDB: The Definitive Guide.


Disaster Recovery FAQ

What is a disaster recovery plan?

A disaster recovery plan is a documented, structured approach with instructions for responding to unplanned incidents. It outlines the procedures to follow to recover and protect a business IT infrastructure in the event of a disaster. This plan includes strategies for handling various types of disruptions, such as hardware failures, data corruption, and security breaches, ensuring that the organization can quickly resume normal operations.

What are the benefits of a disaster recovery plan?

A disaster recovery plan provides several benefits for organizations. Firstly, it ensures business continuity by minimizing downtime and data loss during catastrophic events such as hardware failures, data corruption, or security breaches. By having a structured approach with predefined roles and responsibilities, organizations can quickly restore operations and maintain data integrity. Regular testing and validation of the disaster recovery plan also help identify potential weaknesses and improve overall resilience, which ensures that the organization can effectively respond to – and recover from – unexpected incidents.

What is the difference between disaster recovery and business continuity?

Disaster recovery focuses specifically on restoring IT systems and data access after a disruption. It involves technical solutions like backups, data replication, and failover procedures. Business continuity, on the other hand, encompasses a broader scope, ensuring that all critical business functions can continue during and after a disaster. This includes not only IT systems but also other aspects like personnel, facilities, and communication strategies.

How does CockroachDB aid in disaster recovery?

CockroachDB aids in disaster recovery by providing built-in high availability and fault tolerance features. It uses Raft replication to ensure data is consistently replicated across multiple nodes, which helps maintain data integrity and availability even during failures. CockroachDB also supports point-in-time backups and PCR. These features allow for quick recovery from data loss or corruption, which minimizes downtime and helps to ensure business continuity. By integrating these DR strategies, CockroachDB helps organizations meet their RTO and RPO requirements effectively.

What is the difference between RTO and RPO? 

RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are key metrics in disaster recovery planning. RTO refers to the maximum acceptable amount of time that a system can be offline after a failure, indicating how quickly services must be restored. Meanwhile, RPO defines the maximum acceptable amount of data loss measured in time – it  represents the point in time to which data must be recovered. Put another way, RTO focuses on downtime, while RPO focuses on data loss.

What is disaster recovery in a database? 

Disaster recovery in a database refers to the strategies and processes that are set up to restore database operations after a catastrophic event. This could include hardware failures, data corruption, or security breaches. The goal is to minimize downtime and data loss. CockroachDB, for example, is designed to be fault-tolerant and recover automatically, but having a disaster recovery plan ensures quick recovery and limits the impact of such events. This plan typically involves regular backups and possibly PCR to maintain data integrity and availability.

What is backup and restore?

Backup and restore in a database context involves creating copies of data (backups) at specific points in time and using these copies to recover data (restore) in case of data loss or corruption. This process is crucial for disaster recovery, ensuring data integrity and availability. 

CockroachDB’s Physical Cluster Replication (PCR), on the other hand, continuously replicates data at the byte level from a primary cluster to a standby cluster. PCR provides lower RTO and RPO compared to backup and restore, as it allows for quicker failover and minimal data loss.

More about Disaster Recovery

pgvector

What is pgvector?

pgvector is an open-source extension for PostgreSQL that provides efficient storage, retrieval, and similarity search of vector data. It is particularly useful for applications involving semantic search, recommendation systems, and natural language processing, which are critical for AI-driven applications, including Large Language Models (LLMs).

CockroachDB introduced vector search capabilities in the 24.2 release. This implementation uses the same interface as pgvector and aims to be compatible with pgvector’s API. This compatibility facilitates seamless integration with tools like Langchain and Hugging Face, making it easier to incorporate data into AI models.

The integration of vector search in CockroachDB combines the strengths of a vector database and an operational database into a single, horizontally scalable solution. This approach simplifies the architecture and eliminates the operational and financial costs associated with maintaining a dedicated vector database. Additionally, CockroachDB’s distributed SQL engine allows for complex SQL operations on vector data, enhancing the performance and efficiency of AI and machine learning workloads.

pgvector adds support for vector data types and vector similarity search to the PostgreSQL database. It is particularly useful for applications involving machine learning, natural language processing, and other domains where vector representations of data (such as embeddings) are commonly used.

Key features of pgvector include:

  • Vector Data Type: It introduces a new data type for storing vectors (arrays of floating-point numbers) directly in PostgreSQL tables.

  • Similarity Search: It provides functions for performing similarity searches on vector data, such as finding the nearest neighbors based on various distance metrics (e.g., Euclidean distance, cosine similarity).

  • Indexing: It supports indexing mechanisms to speed up similarity searches, making it efficient to query large datasets.

What is vector search?

Vector search refers to the process of finding vectors in a dataset that are similar to a given query vector. Vector search is a powerful tool for handling high-dimensional data and is widely used in modern AI and data-driven applications to provide relevant and accurate results based on the inherent similarities in the data. 

Common applications of vector search

Vector search is essential in applications where data is represented as high-dimensional vectors, such as:

  • Machine Learning: Finding similar data points in embedding spaces, such as word embeddings, image embeddings, or user embeddings.

  • Natural Language Processing (NLP): Searching for semantically similar text documents, sentences, or words based on their vector representations.

  • Recommendation Systems: Identifying items (e.g., products, movies) that are similar to a user's preferences, represented as vectors.

  • Computer Vision: Finding images that are visually similar to a query image based on their feature vectors.

Check out this video showcasing how to use CockroachDB to build a product recommendation engine for ecommerce with AI:

How does vector search work?

Vector search typically involves the following steps:

  • Vector Representation: Data is converted into vector representations using techniques like word embeddings (e.g., Word2Vec, GloVe), sentence embeddings (e.g., BERT), or image embeddings (e.g., convolutional neural networks).

  • Similarity Metric: A similarity metric (e.g., Euclidean distance, cosine similarity) is chosen to measure the closeness between vectors.

  • Indexing: Efficient indexing structures (e.g., KD-trees, ball trees, HNSW) are used to speed up the search process, especially for large datasets.

  • Querying: The query vector is compared against the indexed vectors to find the most similar ones based on the chosen similarity metric.

Why is pgvector commonly used?

pgvector has become one of the most popular extensions for working with vectors in part because it allows easy integration with PostgreSQL. Users can thus leverage the robustness, scalability, and familiarity of PostgreSQL, making it easier for developers to integrate vector search capabilities into their existing databases without needing a separate specialized system. PostgreSQL has a large and active community, and pgvector benefits from this ecosystem. Users can leverage existing PostgreSQL tools, extensions, and best practices while working with vector data.

By extending PostgreSQL, pgvector allows users to work with vector data using standard SQL queries. This reduces the learning curve and simplifies the development process. pgvector also supports efficient indexing and querying, and can handle large-scale vector data and perform similarity searches quickly, which is crucial for real-time applications. pgvector also allows users to choose the best approach for their specific use case with various distance metrics and indexing methods.

pgvector-compatible vector search in CockroachDB

CockroachDB is deeply invested in providing robust database offerings so that organizations can run their business-critical applications at scale. To that end, the Cockroach Labs team continues to focus on expanding generative AI capabilities, leading to the release of vector search features in v24.2. 

Our vector search features make it easier for users to deploy AI-driven applications with CockroachDB. CockroachDB's vector search functionality is compatible with the pgvector extension for PostgreSQL, allowing users to store and manipulate vectors within the database. 

Although CockroachDB does not currently support vector indexing, it allows for the storage of billions of vectors and supports various vector comparison operators, such as Euclidean distance, inner product, and cosine distance, to facilitate similarity searches. This capability enables developers to build fault-tolerant AI applications that leverage vectors stored in CockroachDB, benefiting from its horizontal data distribution and multi-region abstractions.

pgvector and Vector Search FAQ

What is pgvector?

pgvector is an open-source PostgreSQL extension for efficient storage, retrieval, and similarity search of vector data. It's useful for semantic search, recommendation systems, and natural language processing, critical for AI applications like Large Language Models (LLMs).

What are the key features of pgvector?

The key features of pgvector include storing vectors in PostgreSQL tables, similarity search, and vector indexing.

What is vector search?

Vector search finds vectors in a dataset similar to a query vector. It's used in AI and data-driven applications to provide relevant results based on data similarities.

What are common applications of vector search?

The common applications of vector search include natural language processing to search for semantically similar text in documents, sentences, or words, and identifying similar items, such as products, movies, books, documents, or videos in recommendation systems.

How does vector search work?

Generally vector search works by first converting the data into vectors using embeddings. Then measuring the closeness between vectors via a similarity metric. Additionally, one can use indexing to speed up searches, and then can search for similar vectors by comparing a query vector against the indexed vectors.

Why is pgvector commonly used?

pgvector is popular because it's open-source, integrates with PostgreSQL, and supports efficient indexing and querying. It handles large-scale vector data and performs quick similarity searches, crucial for real-time applications.

What is vector search in CockroachDB?

Starting in v24.2, vector search entered preview in CockroachDB. Vector search in CockroachDB finds data points similar to a query using vectors. Compatible with pgvector, it stores and manipulates vectors, useful for AI applications like LLMs, and supports various vector comparison operators for similarity searches.

More about pgvector

DBaaS (Database-as-a-Service)

What is Database-as-a-Service (DBaaS)?

A Database-as-a-Service (DBaaS) is a cloud-based service model that provides users with access to a database without the need for setting up physical hardware, installing software, or managing the database infrastructure. This service allows organizations to focus on their core business activities while the DBaaS provider handles the database management tasks such as backups, updates, and scaling.

What are the benefits of DBaaS?

There are many potential benefits for utilizing a DBaaS in your organization. DBaaS simplifies the architecture and reduces the operational and financial costs associated with maintaining a dedicated database infrastructure. This service is particularly valuable for organizations looking to modernize their database solutions and reduce the operational headaches of managing existing databases. Including:

  • Operational Simplicity: The service could help offer zero downtime for planned routine maintenance, upgrades, patching, cluster settings, and scaling. By allowing a company that specializes in databases to handle the nitty gritty of deploying your database solutions, your team can trust the experts to ensure data consistency, regulatory compliance, no data loss, and continued innovation in the database space. In turn, your team can focus on creating more business impact rather than operating a database.

  • Scalability: Particularly at larger organizations where you have multiple teams that need different kinds of related data, and that support different services, platforms, applications, or products for your clients, having a dedicated DBaaS can help standardize operations across the organization, and allow a smoother implementation of new offerings.

  • Enterprise Integration: You likely can also integrate your database solution or solutions with enterprise security features such as single sign-on and role-based access controls. The DBaaS could also support observability tools for exporting metrics and logs, or help manage compliance, audit, and data loss protection.

DBaaS vs. Cloud Database

The primary difference between database-as-a-service (DBaaS) and a cloud database lies in the level of management and operational responsibilities handled by the service provider versus the user. For example, a DBaaS provider might manage the database infrastructure, including setup, maintenance, backups, updates, and scaling. This allows organizations to focus on their core business activities without worrying about the underlying database management tasks. On the other hand a cloud database might not offer these “concierge” services. While a cloud database provider might still host the database on their cloud infrastructure, the user is typically responsible for managing the database, including the tasks listed above, increasing the complexity and resource requirements for managing the database.

In summary, DBaaS provides a more managed and simplified approach to database management, reducing the operational burden on the user.

Key considerations when choosing a DBaaS provider

It is a big decision to entrust a provider with managing your database needs. Here are a few key considerations if you choose to find a third-party database provider:

Performance and Latency

Performance and latency are critical considerations when using Database-as-a-Service (DBaaS). Since DBaaS relies on cloud infrastructure, the physical distance between the data center and the end-users can impact the speed at which data is accessed and processed. High latency can lead to slower application performance, which can be particularly problematic for real-time applications and services that require quick data retrieval and updates. Providers often offer various performance tiers and configurations, allowing businesses to select the appropriate level of resources to meet their specific needs.

Compliance and Regulatory Issues

Compliance and regulatory issues are significant challenges when adopting DBaaS, especially for industries with stringent data protection and privacy requirements, such as healthcare, finance, and government sectors. Organizations must ensure that their DBaaS provider complies with relevant regulations, such as GDPR, HIPAA, and PCI-DSS. This includes ensuring that data is stored, processed, and transmitted securely, with appropriate encryption and access controls in place. Additionally, organizations should verify that the provider offers features that support compliance, such as audit logs, data residency options, and regular security assessments. By carefully evaluating the compliance capabilities of a DBaaS provider, businesses can mitigate the risk of regulatory breaches and ensure that their data management practices align with legal requirements.

Data Migration

Data migration to a DBaaS platform can be a complex and resource-intensive process. Migrating existing databases may involve transferring large volumes of data, reconfiguring applications, and ensuring data integrity throughout the process. This can lead to potential downtime and disruptions to business operations if not managed effectively. To facilitate a smooth migration, organizations should leverage the tools and services offered by DBaaS providers, such as automated migration tools, detailed documentation, and professional support. It is also crucial to plan the migration meticulously, conduct thorough testing, and implement a phased approach to minimize risks. By taking these steps, businesses can ensure a seamless transition to a DBaaS environment with minimal impact on their operations.

Check out the video below to learn about how Rightmove, the UK’s #1 property portal – and a publicly traded, multi-billion dollar company that has been in business since 2000 – migrated from Oracle to CockroachDB to support their 24 years worth of industry data: 

Downtime

Downtime is a critical concern for any database system, and DBaaS is no exception. Relying on a third-party provider means that any service outages or disruptions on their end can directly impact the availability of your database. This can lead to significant business interruptions, loss of revenue, and damage to reputation. To mitigate the risk of downtime, it is essential to choose a DBaaS provider with a strong track record of reliability and robust Service Level Agreements (SLAs) that guarantee high availability. Additionally, implementing multi-region deployments and failover mechanisms can enhance resilience and ensure that your database remains accessible even in the event of localized outages.

Data Sovereignty

Data sovereignty refers to the concept that data is subject to the laws and regulations of the country or jurisdiction where it is stored. This is a critical consideration for organizations using DBaaS, as storing data in the cloud often involves data centers located in different jurisdictions. Compliance with data sovereignty requirements is essential to avoid legal and regulatory complications. Organizations must ensure that their DBaaS provider offers data residency options that allow them to store data in specific geographic locations that comply with local laws. Additionally, understanding the data protection regulations of the countries where data is stored and processed is crucial for maintaining compliance and protecting sensitive information. By addressing data sovereignty concerns, businesses can confidently leverage DBaaS while adhering to legal and regulatory obligations.

CockroachDB as a DBaaS provider

CockroachDB as a DBaaS simplifies the architecture and reduces the operational and financial costs associated with maintaining a dedicated database infrastructure. This service is particularly valuable for organizations looking to modernize their database solutions and reduce the operational headaches of managing existing databases. CockroachDB is PostgreSQL compatible so CockroachDB feels like PostgreSQL and scales like NoSQL, making it easier to adopt for your whole organization.

The above video is a talk by Netflix engineers on how they provide CockroachDB-as-a-Service to their customers – Netflix developers – who have a variety of use cases, from a reliable device management platform and supporting a ML orchestration platform, to Netflix’s new gaming platform. By deploying CockroachDB-as-a-Service, Netflix engineers can focus on expanding the scope of their business and rely on CockroachDB to ensure that all their applications are always running smoothly with consistent data.

DBaaS FAQ

What is Database-as-a-Service (DBaaS)?

Database-as-a-Service (DBaaS) is a cloud computing service that provides users with access to a database without the need for physical hardware, software installation, or database management. It allows businesses to focus on using the database rather than maintaining it, as the service provider handles all backend tasks such as updates, backups, and scaling.

How does DBaaS work?

DBaaS works by hosting databases on cloud infrastructure. Users can access and manage their databases through a web interface or API. The service provider takes care of the underlying infrastructure, including servers, storage, and network resources, ensuring high availability, security, and performance.

What are the benefits of using DBaaS?

Benefits of using DBaaS include cost efficiency, scalability, accessibility, and performance. In addition, a major benefit is that the DBaaS provider can handle updates, backups, and security, freeing up your resources to focus on building and maintaining business-critical applications.

What types of databases are available as DBaaS?

DBaaS offerings include a variety of database types such as relational databases like CockroachDB or NoSQL databases like MongoDB.

Is DBaaS secure?

Yes, DBaaS providers implement robust security measures including encryption, access controls, and regular security audits. However, it is essential for users to follow best practices such as strong password policies and regular monitoring to ensure data security.

How do I choose the right DBaaS provider?

Consider the following factors when choosing a DBaaS provider:

  • Database Compatibility: Ensure the provider supports the database type you need.

  • Performance and Scalability: Check the provider's performance benchmarks and scalability options.

  • Security Features: Look for comprehensive security measures and compliance certifications.

  • Cost: Compare pricing models and ensure they fit your budget.

  • Support and Reliability: Evaluate the provider's customer support and service level agreements (SLAs).

Can I migrate my existing database to a DBaaS?

Yes, most DBaaS providers offer tools and services to help migrate existing databases to their platform. The migration process typically involves exporting your current database, transferring the data to the cloud, and configuring the new environment. For example, you can check out CockroachDB’s MOLT (Migrate Off Legacy Technology) tools.

What is the difference between DBaaS and traditional database hosting?

Traditional database hosting requires users to manage the hardware, software, and maintenance of the database environment. In contrast, DBaaS offloads these responsibilities to the service provider, allowing users to focus on database usage and application development.

Can DBaaS handle large-scale applications?

Yes, DBaaS is designed to handle applications of all sizes, from small projects to large-scale enterprise applications. Providers offer various scaling options to accommodate growing data and user demands.

More about DBaaS (Database-as-a-Service)

Black Swan Event

A black swan event is an event that has the following three attributes:

  1. It was unexpected.

  2. It had significant, wide-ranging consequences.

  3. After it happens, people will suggest that it was predictable, despite the fact that it was not widely predicted before it happened.

This definition comes from mathematical statistician Nassim Taleb, who coined and popularized the term in his books Fooled by Randomness and The Black Swan.

However, in everyday language when people talk about a “black swan event,” they’re generally thinking just about unexpected events with wide-ranging consequences. The third criteria – that people will rationalize the event as predictable after the fact – isn’t typically discussed.

Officially the term doesn’t have a positive or negative connotation. Black swan events can theoretically be good or bad or neutral. In the real world, however, the term is often used to describe events with negative impacts, such as financial crashes, widespread service outages, and even natural disasters and terrorism.

Long story short: while the official definition of a black swan event is quite a bit more nuanced, in everyday life the term is often used to mean something pretty simple: an unexpected event with significant negative consequences.

(The name, in case you’re wondering, comes from the once-widespread perception that all swans were white, and black swans either didn’t exist or were incredibly rare. In reality, black swans do exist. But they’re only native to Australia, so they’re quite rare everywhere else – including in Europe, where the idea of a “black swan” as a symbol for something unpredictable, unexpected, or unlikely first came to be used.)

Black swan events in technology

In the tech industry, most discussion of black swan events is typically related to infrastructure, and infrastructure failures leading to service outages.

On a global scale, a black swan event in technology could be something like a coronal mass ejection from the sun causing a kind of natural EMP that knocks out electronic systems over a large area. (Whether this would be a true black swan event is debatable, considering that it has happened before, but it happens only rarely and is considered unlikely).

More commonly, though, discussions of black swan events in technology are company-specific, meaning that a “black swan event” is an event that significantly and negatively impacts the company’s services, generally due to some kind of infrastructure failure or outage.

Examples of potential company-level black swan events in technology include:

* Cloud provider outages.

* Power outages.

* System failures.

* Human mistakes.

Real-world example of a black swan event in tech

More about Black Swan Event

Distributed SQL

What is Distributed SQL?

Distributed SQL represents a significant evolution in database technology, designed to meet the needs of modern cloud applications. Traditional SQL databases were built for data consistency, vertical scalability, and tight integration, which worked well for monolithic applications on single-server environments. However, as the paradigm shifted to distributed applications in the cloud, these traditional databases began to show limitations.

SQL vs. NoSQL

SQL (Structured Query Language) and NoSQL (Not Only SQL) databases serve different purposes and have distinct characteristics. SQL databases, such as CockroachDB, MySQL, and PostgreSQL, are relational databases that use structured query language for defining and manipulating data. They are known for their ACID (Atomicity, Consistency, Isolation, Durability) properties, which ensure reliable transactions and data integrity. SQL databases are ideal for complex queries and transactions, supporting structured data with predefined schemas and relationships.

On the other hand, NoSQL databases, like MongoDB and Cassandra, are designed to handle unstructured or semi-structured data. They may offer flexible schemas, allowing for rapid development and iteration. NoSQL databases are often used for large-scale data storage and real-time web applications due to their ability to scale horizontally and handle high volumes of read and write operations. They typically provide eventual consistency rather than the strong consistency guaranteed by SQL databases.

The Need for Distributed SQL

Modern applications require horizontal scalability, elasticity, and support for microservices. Traditional single-node relational databases, with their fixed schemas and lack of support for distributed data models, are not suited for these needs. Distributed SQL databases address these challenges by combining the consistency and structure of early relational databases with the scalability, survivability, and performance first pioneered in NoSQL databases.

Core Capabilities of Distributed SQL Databases

Horizontally scalable: Distributed SQL databases can seamlessly scale to mirror the capabilities of cloud environments without introducing operational complexity. They distribute data across multiple nodes, ensuring efficient resource utilization.

Consistency: These databases deliver high levels of isolation in distributed environments, mediating contention and ensuring transactional consistency across multiple operators.

Resilience: Distributed SQL databases provide inherent resilience, reducing recovery times to near zero and replicating data naturally without external configuration.

Geo-replication: They allow for the distribution of data across geographically dispersed environments, ensuring low latency access and compliance with data sovereignty requirements.

In addition to the unique capabilities of distributed SQL, these databases must also meet foundational requirements such as:

  • Operational efficiency: Easy installation, configuration, and control of the database environment.

  • Optimization: Advanced features like cost-based optimizers for query performance.

  • Security: Key capabilities for authentication, authorization, and accountability.

  • Integration: Compatibility with existing applications, ORMs, ETL tools, and more.

How does distributed SQL work?

Distributed SQL databases address these challenges by combining the consistency and structure of relational databases with the scalability, survivability, and performance first pioneered in NoSQL databases. They distribute data evenly across multiple nodes, ensuring efficient resource utilization and high availability. These databases deliver high levels of isolation in distributed environments, mediating contention and ensuring transactional consistency across multiple operators. Additionally, they provide inherent resilience, reducing recovery times to near zero and replicating data naturally without external configuration.

CockroachDB, for example, is a distributed SQL database that uses a transactional and strongly-consistent key-value store. It scales horizontally, is incredibly resilient against various kinds of outages, including disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. CockroachDB supports strongly-consistent ACID transactions and provides a familiar SQL API for structuring, manipulating, and querying data. It ensures data consistency in a distributed environment by guaranteeing serializable SQL transactions, the highest isolation level defined by the SQL standard, using the Raft consensus algorithm for writes and a custom time-based synchronization algorithm for reads to guarantee strong data consistency.

Examples of Distributed SQL Databases

Several databases meet the requirements of distributed SQL, including Google Spanner, Amazon Aurora, Yugabyte, FaunaDB, and CockroachDB. These databases offer various levels of support for the core capabilities mentioned above, and therefore are more appropriate for different use cases.

  • Amazon Aurora: Amazon Aurora is a cloud-based relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. Aurora is often described as a distributed database, but it does not scale for writes, and therefore is not truly distributed. Aurora is designed to provide high availability and durability, with automatic failover and replication across multiple Availability Zones. It supports MySQL and PostgreSQL and offers features like automatic scaling and serverless options.

  • CockroachDB: CockroachDB is a distributed SQL database designed for global, cloud-native applications. CockroachDB is PostgreSQL compatible, so that most applications built on PostgreSQL can be migrated without changing the application code. CockroachDB provides strong consistency, horizontal scalability, and high availability. CockroachDB supports ACID transactions and a familiar SQL interface, making it easy to use for developers. It survives various types of failures, including mechanical, data center, region, and even cloud failures, with minimal latency disruption.

  • FaunaDB: FaunaDB is a distributed, multi-model database that provides strong consistency, ACID transactions, and a flexible data model. It is designed for serverless applications and offers a globally distributed architecture that ensures low-latency access to data. FaunaDB supports GraphQL and FQL (Fauna Query Language) for querying data.

  • Google Spanner: Google Spanner is a globally distributed database service that provides strong consistency and horizontal scalability. It uses a combination of hardware (atomic clocks) and software to achieve global consistency and high availability. Spanner supports SQL queries and is designed to handle large-scale, mission-critical applications.

  • YugabyteDB: YugabyteDB is an open-source, distributed SQL database. It supports both SQL and NoSQL workloads, making it versatile for various use cases.

Evaluating Distributed SQL Databases

When evaluating a distributed SQL database, it is essential to consider several core requirements to ensure it meets the needs of modern applications. First, assess the database's scalability. A distributed SQL database should be able to scale horizontally, distributing data evenly across multiple nodes to handle increased loads without compromising performance. This capability is crucial for applications that experience variable or growing workloads, and can support businesses at any stage of growth, scaling as demand scales. Additionally, the database should support strong consistency, ensuring that all nodes reflect the same data state, which is vital for maintaining data integrity across distributed environments.

Another critical factor is high availability and resilience. The database should be designed to handle failures gracefully, whether they occur at the disk, machine, rack, or even datacenter level, with minimal disruption to operations. This includes features like automatic failover, data replication, and quick recovery times.


RELATED

To learn more about inherently resilient systems, check out this webinar hosted by Cockroach Labs’ CTO and CPO, Peter Mattis and Technical Evangelist, Rob Reid: “The Always-On Dilemma: Disaster Recovery vs. Inherent Resilience.”


Geo-replication is also important, allowing data to be distributed across multiple geographic locations to reduce latency and comply with data sovereignty requirements. 

Finally, consider operational efficiency, security, and integration capabilities. The database should be easy to install, configure, and manage, offer robust security features for authentication and authorization, and integrate seamlessly with existing applications and tools.

A mature distributed SQL database should meet all these requirements, ensuring it is suitable for business-critical applications.

Distributed SQL Use-Cases

Distributed SQL databases are utilized across various verticals to address specific industry challenges and requirements. 

For example, check out this presentation by Netflix engineers who provide CockroachDB-as-a-Service (DBaaS) to Netflix developers with a variety of use-cases:

Here are some examples of vertical use cases for distributed SQL databases:

Distributed SQL for Financial Services

In the financial sector, distributed SQL databases are crucial for applications such as payment processing, trading platforms, and identity management. These applications require strong consistency, high availability, and the ability to handle high transaction volumes. For instance, payment systems must ensure accurate and timely transactions, while trading platforms need to process large volumes of trades with minimal latency. Identity management systems benefit from distributed SQL databases by providing secure and consistent access to user data across multiple regions.

Distributed SQL for Retail and eCommerce

Retail and eCommerce companies leverage distributed SQL databases for order and inventory management. These systems must handle large spikes in traffic, such as during Black Friday or Cyber Monday, without compromising performance (or overselling products). Distributed SQL databases provide horizontal scalability, ensuring that the system can manage increased demand. They also offer strong consistency, which is essential for maintaining accurate inventory levels and processing transactions reliably. Additionally, these databases support multi-region deployments, allowing retailers to provide low-latency access to customers worldwide.

Distributed SQL for Gaming

The gaming industry uses distributed SQL databases to manage user accounts, in-game transactions, and real-time data processing. Gaming platforms often experience high concurrency with thousands of players interacting simultaneously. Distributed SQL databases ensure that user data is consistent and available, even during peak usage times. They also support the scalability needed to accommodate growing user bases and the resilience required to maintain uptime during unexpected failures.

Distributed SQL for Logistics and Supply Chain

In logistics and supply chain management, distributed SQL databases are used for scheduling, workflow management, and tracking systems. These applications require precise coordination and data integrity to ensure timely deliveries and efficient operations. Distributed SQL databases provide the high availability and fault tolerance needed to prevent disruptions in logistics workflows. They also support geo-replication, which helps in maintaining data consistency across different geographic locations.

These examples illustrate how distributed SQL databases can be tailored to meet the specific needs of various industries, providing the scalability, consistency, and resilience required for modern applications.

Try a Distributed SQL Database

Distributed SQL is the future of database management in the cloud, offering the scalability, consistency, and resilience needed for modern applications. CockroachDB, among others, exemplifies these capabilities, making it a strong candidate for organizations looking to transition to cloud-native distributed SQL databases.

Get started with CockroachDB Cloud for free, today!

Distributed SQL FAQ

What is distributed SQL?

Distributed SQL refers to a class of relational databases that distribute data across multiple nodes to ensure high availability, fault tolerance, and scalability. These databases maintain SQL capabilities while leveraging a distributed architecture to handle large-scale, geographically dispersed data.

How do distributed SQL databases differ from traditional SQL databases?

Traditional SQL databases are typically monolithic, meaning they run on a single server. Distributed SQL databases, on the other hand, spread data across multiple servers or nodes. This distribution allows for better performance, scalability, and resilience against failures.

What are the benefits of using distributed SQL?

Distributed SQL databases offer several benefits, including scalability, high availability, and resilience against mechanical failures. Distributed SQL databases combine the consistency and structure of early relational databases with the scalability, survivability, and performance first pioneered in NoSQL databases.

How does CockroachDB implement distributed SQL?

CockroachDB is a distributed SQL database that uses a transactional and strongly-consistent key-value store. It scales horizontally, survives various types of failures, including mechanical, data center, region, and even cloud failures with minimal latency disruption and no manual intervention. CockroachDB supports strongly-consistent ACID transactions and provides a familiar SQL API for structuring, manipulating, and querying data.

How does CockroachDB ensure data consistency in a distributed environment?

CockroachDB guarantees serializable SQL transactions, the highest isolation level defined by the SQL standard. It uses the Raft consensus algorithm for writes and a custom time-based synchronization algorithm for reads to ensure consistency between replicas.

What are some common use cases for distributed SQL?

Distributed SQL is ideal for many use cases including for distributed customer bases as you can bring the data closer to the customer, businesses that handle a high transaction volume, and businesses that handle spiky workloads. For example, this can include financial institutions, retailers or eCommerce businesses, and gaming platforms.

How does CockroachDB compare to other distributed databases?

CockroachDB supports SQL syntax and scales easily without the manual complexity of sharding, rebalances and repairs itself automatically, and distributes transactions seamlessly across your cluster. It provides strong consistency and supports distributed transactions, unlike many other distributed databases.

More about Distributed SQL

CockroachDB

What is CockroachDB?

At the highest level, CockroachDB is software for storing data. More specifically, CockroachDB is a distributed SQL database technology that enables users to enjoy many of the benefits of the traditional relational database (such as the familiar SQL language, reliable schema, etc.) while also offering the key features required for a modern, cloud-native database, such as high availability, bulletproof resilience, elastic scale, and geographic scale.

Learn more about CockroachDB.

More about CockroachDB

Cost-Based Optimizer

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is a cost-based optimizer?

The Cost-Based Optimizer is a feature of CockroachDB that looks at all possible ways in which a query can be executed and assigns each a “cost” that indicates how efficiently the query can be run. Then the optimizer chooses the way that has the lowest cost, and is therefore most efficient. This feature only works with databases that speak SQL, so it’s an added benefit obtained from having a SQL layer.

More about Cost-Based Optimizer

Encoding

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is encoding?

Encoding is the process by which CockroachDB converts SQL statements into bytes (because the lower layers of CockroachDB deal with bytes).

More about Encoding

Gateway Node

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is a Gateway Node?

When a request comes into CockroachDB, the load balancer routes the request to the node it thinks can best handle the request at that time. This node is called the Gateway Node. The Gateway Node receives the request and then responds to the request. It identifies which nodes in the cluster are the leaseholders for the specific requests that came in, and sends the requests to those nodes.

More about Gateway Node

Key Value (KV) Layer

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is the Key Value (KV) Layer?

The key value layer is a figurative layer of CockroachDB. One helpful way to think about the architecture of CockroachDB is as a SQL system built on top of a key value store. When the data is up in the top layers, it follows the rules of a SQL system and is structured in a table format. When the data travels deeper into the database, table format no longer works, due to the distributed nature of the database. So instead of being stored in tables, the data is stored in a different way: in key-value pairs. It’s important to note that this combination of a SQL upper layer with a key-value store underneath is a relatively unusual setup, because translating structured table data into key-value pairs is a difficult task.

Key Value (KV) Pair

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is a Key Value (KV) Pair?

A key-value pair is a way of storing data. In CockroachDB, individual rows from the tables are mapped into key-value pairs. One column becomes the index, meaning that each piece of data in that column is the “key” in its own key-value pair. One or more other columns become the “value.”

For example, in a table called “Customer Data” the first column might be “Name,” the second “Hometown” and the third “Email Address.” You might decide that the “Name” should be the key because ultimately you want all your data to be sorted by name. Then you might decide that the “Hometown” and “Email Address” columns should be the values.

This information gets mapped, row-by-row, into key-value pairs, and the ultimate format of a single row might end up reading as a string: “Customer Data Table/John Smith/New York City/Johnsmith@gmail.com.” Then, once all the data in the table is translated into the monolithic sorted key-value map, these pieces of data are all sorted by the name value.

More about Key Value (KV) Layer

Load Balance

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is the Load Balancer?

The Load Balancer is a piece of software added to the front of CockroachDB that helps balance the requests coming into the database. All the nodes in CockroachDB can process requests (they’re all symmetrical), so it makes the most sense to spread the work around between nodes so that the requests can be completed more quickly. The load balancer accomplishes that.

More about Load Balance

Meta Range

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is a Meta Range?

A meta range is information stored in the cluster that tells the database where to find ranges – i.e., which nodes have certain ranges – so that the database can access the correct range. Sometimes also referred to as an index, but a different definition of an index than above.

More about Meta Range

ACID or A.C.I.D

What is ACID?

ACID stands for Atomicity, Consistency, Isolation, and Durability. These are a set of characteristics that are desirable for database transactions. Together, they guarantee that transactions and all data stored in the database remains valid even in the event of errors or power failures. These characteristics are extremely important for any OLTP database.

Specifically:

Atomicity: Each transaction must be treated as a single, indivisible unit (even if processing the transaction requires multiple steps). In other words, every step of the transaction must complete successfully or the entire transaction will fail and no change will be written to the database. This is desirable because without atomicity, a transaction that’s interrupted while processing may only write a portion of the changes it’s supposed to make, which could leave the database in an inconsistent state.

Consistency: No transaction can violate the integrity of the database – all transactions must leave the database in a valid state.

Isolation: Any concurrently-processed transactions (i.e. transactions happening at the same time) leave the database in the same state as if they were executed sequentially (i.e. one after another).

Durability: Once committed, the transaction remains committed even in the case of hardware failure, power outage, etc.

More about ACID or A.C.I.D

Unstructured Data

What is Unstructured Data?

Unstructured Data is essentially the opposite of structured data: data that is not arranged with any kind of data model or schema and that can’t be easily adapted to a table format.

For example, datasets consisting of video or audio files are good examples of unstructured data. A traditional table structure would work for storing metadata about videos (such as title, description, Youtube link, etc.), but it is not a good fit for storing the videos themselves.

CockroachDB does support some types of unstructured data via its JSON support, but it was designed with primarily structured data in mind.

More about Unstructured Data

Transaction Layer

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is the transaction layer?

The Transaction Layer is the layer of CockroachDB that receives requests from the SQL Layer and coordinates concurrent operations.

More about Transaction Layer

TPC-C

What is TPC-C?

TPC-C, short for Transaction Processing Performance Council Benchmark C, is a benchmark used to measure how well a database holds up when it’s trying to run certain workloads. TPC-C is specifically an OLTP benchmark, and it’s widely recognized and standardized. TPC-C simulates an environment where a lot of users are making requests of a database, and then it measures how well the database holds up – i.e., how fast it can complete the transactions, what the cost of completing the transactions is, and so on.

More about TPC-C

Structured Data

What is Structured Data?

Structured Data (also called relational data) is data that lives best in a structured format, i.e., the kind of data you would enter into a table in a spreadsheet. The best way to organize it is in columns and rows.

Two examples of structured data are an inventory of products for an eCommerce site, or a list of customers and their information. CockroachDB was designed primarily to support this kind of data (although it does also include support for unstructured data).

More about Structured Data

Storage Layer

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is the Storage Layer?

The storage Layer is the layer of CockroachDB that writes data to the disk (the Physical Storage component), and reads data from the disk. It’s still part of the software, but it’s the piece that communicates with the hardware.

More about Storage Layer

SQL Layer

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is the SQL Layer?

The SQL Layer is the layer of CockroachDB that speaks to the application or client using the programming language SQL, adhering to the PostgreSQL Wire Protocol. After performing various tasks, the SQL layer sends the relevant requests to the Transaction Layer.

More about SQL Layer

SQL API

What is a SQL API?

A SQL API is an API for interacting and communicating with a database. In the case of CockroachDB, CockroachDB offers a SQL API to users and apps. This means that users and apps can send SQL commands into the API and receive the results of their query in return. The commands must follow the PostgreSQL Wire Protocol, because CockroachDB adheres to it.

More about SQL API

Serializable Isolation

What is Serializable Isolation?

Serializable Isolation is the highest level of isolation possible under the guidelines provided by the “ANSI SQL Standard” (an official standard of best SQL practices). It means that transactions committed to a database appear as if they were executed one after another, even if they were processed in parallel. Most distributed databases only achieve a lower level of isolation, called “snapshot isolation,” but CockroachDB is able to achieve serializable isolation.

More about Serializable Isolation

Secondary Index

What is a Secondary Index?

A Secondary Index is a secondary column by which you can sort data. CockroachDB supports both primary and secondary indexes. This just adds another level of organization to data sorting. For example, in a table of data about employees, you might first sort alphabetically by “name,” (the primary index) and then within that, sort alphabetically by “hometown” (the secondary index).

More about Secondary Index

RTO (Recovery Time Objective)

What is RTO (Recovery Time Objective)?

RTO (Recovery Time Objective) is the amount of time a system’s data is unavailable due to a failure, before the system returns to service. The goal is zero RTO, and CockroachDB achieves this through its multi-active availability model.

More about RTO (Recovery Time Objective)

RocksDB

What is RocksDB?

RocksDB is a piece of software that was embedded in CockroachDB to store key-value data. CockroachDB used RocksDB to communicate with the disk in order to actually store data. CockroachDB now uses Pebble, a RocksDB-inspired and compatible key-value store that is specifically designed for distributed SQL databases. Many other tech companies still use RocksDB.

More about RocksDB

Replication Layer

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is the Replication Layer?

The Replication Layer is the layer of distributed database software that copies data between nodes and ensures consistency between these copies. In CockroachDB, this is accomplished by implementing the Raft consensus protocol.

More about Replication Layer

Region

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a Region?

A region is a specific geographical location where you can host your resources. Each region has one or more zones; most regions have three or more zones. For example, US-West is the West Coast of the US. US-East is the East Coast. For example, public cloud providers often allow you to choose the region or regions you’d like to deploy to.

More about Region

Range / Shard

What is a Range / Shard?

A range in CockroachDB (called a “shard” in other databases) is a chunk of data, stored as key-value pairs. The Distribution Layer breaks tables apart into these chunks so the data can be distributed across different nodes.

In CockroachDB, a range is 512 MiB or smaller. This default range size is a sweet spot – small enough to move quickly between nodes, but large enough to store a meaningfully contiguous set of data whose keys are more likely to be accessed together. Once a range gets bigger than 512 MiB, it’s split into two smaller chunks in order to keep the ranges from getting too big.

More about Range / Shard

Range Replication

What is Range Replication?

Range replication is the duplication of ranges on multiple nodes so that if one node fails, the range’s data is still accessible via another node. In CockroachDB, each range is replicated three times by default. This replication allows CockroachDB to be highly available and remain online even when a node goes down, because the data lives in two other places (and two nodes are required to achieve quorum). When a failure happens, CockroachDB automatically redistributes data to the surviving nodes.

More about Range Replication

Raft Consensus Protocol

What is the Raft Consensus Protocol?

The Raft Consensus Protocol is the set of quorum guidelines that CockroachDB follows to ensure each range is consistent. It’s an algorithm that makes sure all copies agree on changes to data. A “leader” is elected for each range (leaders are also known as Leaseholders), to coordinate changes for that range, and the two other ranges are called “followers.” Changes get entered in the leader’s log, then the leader sends out the changes to the followers, and the changes get entered into their logs. Once this happens, the change is committed to the leader’s data, and then committed to the followers' data.

More about Raft Consensus Protocol

Quorum

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is Quorum?

Quorum is the consensus required to commit changes in a distributed database such as CockroachDB. Different types of distributed databases may use different systems for quorum. CockroachDB uses the Raft Consensus Protocol, in which a minimum of two nodes are required to achieve a consensus (i.e. quorum) in a three-node system.

More about Quorum

Public Cloud

What is a Public Cloud?

Public Cloud is the term for the most common kind of cloud computing vendor. Data lives in “public,” and hardware, storage, and network devices are shared with other organizations or cloud “tenants.” The services are accessed and managed via a web browser.

More about Public Cloud

Private Cloud

What is a Private Cloud?

In a private cloud, a company’s data is stored in a dedicated cloud environment, rather than being stored on shared machines. The services and infrastructure are always maintained on a private network and the hardware and software are dedicated solely to that single organization. Benefits include increased security and more customization. Vendors include Amazon Virtual Private Cloud (VPC), Dell Cloud Solutions, and Microsoft Private Cloud.

More about Private Cloud

Primary Index

What is a Primary Index?

An index is a column whose data is designated to the “key” location in a Key-Value pair. Think of sorting data in a spreadsheet; the column by which you’re sorting the data is called the index. A primary index is the main column by which the data is sorted.

More about Primary Index

PostgreSQL Wire Protocol

What is the PostgreSQL Wire Protocol?

PostgreSQL Wire Protocol is a dialect of the SQL language that’s spoken by the database called PostgreSQL. Many people have used PostgreSQL before and are familiar with its language, and many apps and ORMs already use the dialect. The fact that CockroachDB adheres to the PostgreSQL Wire Protocol makes it easy for customers to plug into the database and use it.

More about PostgreSQL Wire Protocol

Physical Storage

What is Physical Storage?

Physical Storage is the hardware on which the data is stored. CockroachDB’s Storage Layer (software) communicates with this hardware to physically write data onto the disk and read data from the disk.

More about Physical Storage

ORM (Object-Relational Mapper)

What is an ORM (Object-Relational Mapper)?

An ORM is a software intermediary between the application and the database. It allows developers to speak to a SQL database using a language other than SQL. Some developers may not be experienced with SQL, or simply prefer to write in languages like Python, C++, Javascript etc. When writing the parts of their application that communicate with a database, they can use an ORM as a go-between, to translate their code into SQL and send it to the database to make requests.

More about ORM (Object-Relational Mapper)

On-Prem (On-Premises)

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is On-Prem (On-Premises)?

In the context of databases, on-prem describes a database deployment in which a company’s data is stored on a machine that they own, rather than stored with a public cloud provider such as GCP or AWS. Companies often choose to store proprietary or high-security data on-prem because they can have more control over the safety of it.

More about On-Prem (On-Premises)

OLTP (OnLine Transaction Processing)

What is OLTP (OnLine Transaction Processing)?

OLTP (Online Transaction Processing) describes the kind of data processing that deals with heavy transactional workloads. In OLTP workloads, in other words, there are many relatively simple, transactional operations (many reads and writes) constantly coming into the database. The data typically relates to standard business tasks like keeping track of inventory, hotel reservations, or online banking functions.

The other main kind of data processing is OLAP (Online Analytics Processing). Compared to OLTP workloads, the workloads run on OLAP databases are usually much more complicated but much less frequent.

Often, database technologies are designed with one or the other in mind. OLTP databases such as CockroachDB are focused on transactional use cases such as payment processing, logistics, metadata management – basically any use case that involves frequent reads and writes. OLAP databases, in contrast, are typically used for data analytics.

Often a company may use both, with the OLTP database handling the transactions coming from the application, and with relevant data periodically offloaded to an OLAP database for analysis. This approach ensures that even very complex analytics work will not interfere with the performance of the OLTP database.

More about OLTP (OnLine Transaction Processing)

Node

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a Node?

A node is a single instance of a distributed database such as CockroachDB, or one individual machine of many that are running the same distributed database. Many nodes join together to create the full cluster.

More about Node

MVCC (Multiversion Concurrency Control)

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is MVCC (Multiversion Concurrency Control)?

MVCC (Multiversion Concurrency Control) is a protocol that CockroachDB follows to ensure isolation of transactions when concurrent transactions are happening. Without MVCC, if a database is being used in multiple ways at the same time, then someone might see half-written or inconsistent data. MVCC keeps multiple copies of data, so each user sees a snapshot of the database at a particular instant in time, and they won’t see changes until the transaction has been committed.

More about MVCC (Multiversion Concurrency Control)

Multi-Cloud

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is Multi-Cloud?

Multi-Cloud refers to a strategy where multiple cloud providers are being used, rather than just one. Typically, these are multiple public cloud providers (i.e. AWS and GCP). CockroachDB is one of the few distributed SQL databases that supports multi-cloud deployments.

More about Multi-Cloud

Multi-Active Availability

What is Multi-Active Availability?

Multi-Active Availability is CockroachDB’s high availability model. All replicas in the cluster can handle traffic, including both reads and writes, but the Raft consensus is used to ensure data remains consistent. This model also prevents RTO if a node goes down.

More about Multi-Active Availability

Monolithic Sorted Key Value Map

Note: This term is specific to CockroachDB, a Distributed SQL database. In other contexts, it may be used differently.

What is a Monolithic Sorted Key Value Map?

When all of the data from all of the tables is translated into key-value pairs in CockroachDB, it’s called a monolithic sorted key-value map. This just means a giant, list of key-value pairs that correspond to rows in tables, organized in a way that allows you to easily insert and find data.

More about Monolithic Sorted Key Value Map

Mainframe

What is a mainframe?

A mainframe is a gigantic machine (typically made by IBM) that has a lot of storage abilities and high computing power. Mainframes are almost exclusively on-prem and privately owned by the businesses that use them. However, recently IBM has released a version of its mainframe to be used in private cloud settings.

More about Mainframe

Machine / Server

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a machine or server?

A machine is just a computer. It can live in a data warehouse or on-prem. It can vary in computing power and storage abilities. Typically, the machines that live in data warehouses to be used for cloud purposes are much smaller with less power than mainframes.

More about Machine / Server

JSON

Note: Here, we are defining JSON in the context of a distributed database.

What is JSON?

JSON, or Javascript Object Notation, is a way of formatting “semi-structured” data, and CockroachDB supports it by default. A small amount of data in CockroachDB is typically JSON data.

To understand what JSON is, follow this example: Imagine a table that stores information about blog posts. Much of the data is structured; every blog post needs data on title, author, and number of words (and these become the columns in the table). But there might be additional data that’s not applicable to every post, such as if a user comments on a post or likes a post. Instead of having to make a separate column for each of these data pieces (which would be inefficient since many of the cells would be empty), you can make a single column that supports JSON – and then you format the unique data pieces in JSON and put any item at all into that column, without having a predefined label attached to it.

More about JSON

Isolation

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is Isolation?

Isolation describes a desirable database characteristic in which concurrently-processed transactions (i.e. transactions happening at the same time) leave the database in the same state as if they were executed sequentially (i.e. one after another).

It is one of the four ACID properties that are desirable for databases dealing with transactional workloads.

More about Isolation

Hybrid Logical Clock (HLC) Timestamps

What are Hybrid Logical Clock (HLC) Timestamps?

Accurate time is very important in distributed systems, because events often occur at different nodes at the same time, and these events need to be ordered accurately. Google Spanner uses atomic clocks to accomplish this, but because CockroachDB is open source and can run on any public/private cloud/on-prem, it’d be impossible to use atomic clocks. Instead CockroachDB used HLC, which is a clock method that has logical and physical components. CockroachDB applies HLC timestamps to all transactions so the system knows when they occured and can order them correctly.

More about Hybrid Logical Clock (HLC) Timestamps

Hybrid Cloud

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is Hybrid Cloud?

Hybrid Cloud refers to a strategy where cloud storage and on-prem storage are both being used. A typical example of this would be a single company storing sensitive data on-prem and less sensitive data in the cloud.

More about Hybrid Cloud

High availability

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is high availability?

High Availability is a desirable characteristic for databases; it describes the ability of a database to survive (and thus remain available) even when parts of the system fail. For example, a CockroachDB database with five nodes could survive and remain available even if several nodes failed.

Different databases use different models to achieve high availability. The two most common high availability models are Active-Passive and Active-Active. Meanwhile, CockroachDB uses a Multi-Active Availability model.

More about High availability

Gossip Protocol

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a gossip protocol?

In the context of a distributed database, the gossip protocol is the form of communication used between nodes, to allow each node to locate data across the cluster.

More about Gossip Protocol

Durability

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is durability?

In the context of a database, durability is a desirable property that describes a system where, once data has been committed to the database, it will remain even in the event of machine failures, power outages, etc.

It is one of the four ACID properties that are desirable for databases dealing with transactional workloads.

More about Durability

Driver

Note: This term has other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a driver?

In the context of databases, a driver is a piece of code that you add to your app to help it speak the language necessary for communicating with the database, such as SQL. For example, if you’re building a Python app, you might use the psycopg2 driver to enable it to communicate with CockroachDB using SQL.

More about Driver

Distribution Layer

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is the distribution layer?

In a distributed database such as CockroachDB, the distribution layer takes data from the SQL layer and breaks it down into chunks called ranges to be stored in a distributed way (i.e., it is stored in multiple locations instead of just a single location).

More about Distribution Layer

Data Warehouse / Datacenter

What is a data warehouse or datacenter?

A data warehouse or datacenter is a giant warehouse where thousands of machines live, arranged in racks. Cloud providers own many of these warehouses, and when you run or store something on the cloud, it lives in one of these warehouses.

More about Data Warehouse / Datacenter

CPU

What is a CPU?

A CPU (Central Processing Unit) is a chip that sits on top of the motherboard of a computer and carries out instructions from the software. Usually CPUs are multi-core, meaning that there is more than one CPU on a single chip. In the context of databases, the power of the CPUs on the machines running your database will impact the performance of the database.

More about CPU

Core

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a core?

A core is a component of a CPU that carries out instructions. A multi-core CPU has multiple CPUs together on a single chip, increasing the chip’s overall computing power.

More about Core

Consistency

What is database consistency?

Most database systems have rules for what kinds of data can and cannot be stored (among other rules). Consistency is the term used to describe a database in which those rules are always enforced. A database is said to have consistency when no transaction can violate the integrity of the database – all transactions must leave the database in a valid state.

It is one of the four ACID properties that are desirable for databases dealing with transactional workloads.

More about Consistency

Cluster

Note: This term may have other meanings in other contexts. Here, we are defining it in the context of a distributed database.

What is a cluster?

A cluster is the full collection of nodes associated with a distributed database. For example, a company might spin up a CockroachDB cluster with five nodes for its database.

More about Cluster

Cloud

What is the cloud (in the context of databases)?

In the context of databases, “cloud” refers to storing your data on machines that belong to a third-party cloud provider. This is more cost efficient than on-prem storage, as it eliminates the need for a company to maintain its own machines, and typically increases the availability and scalability of the company’s services. This is because with cloud storage, data is stored across many smaller computers inside a data warehouse and across data warehouses (when storing data on-prem, it’s usually stored in a single massive computer or mainframe).

More about Cloud

Byte

What is a byte?

In the context of data storage, a byte is the most basic format for coding a character on a computer. A byte is a group of eight 0’s and 1’s in a specific order that represents their character. For example, the letter “A,” when translated into bytes, reads as: “01000001”

More about Byte

AZ (Availability Zone, or Zone)

What is an AZ?

An AZ, availability zone, or just “zone”, typically refers to a single data warehouse within a region. Multiple warehouses/zones make up a single region. Sometimes, an availability zone isn’t at the warehouse level, but instead at the rack level - i.e., a single rack within a warehouse.

More about AZ (Availability Zone, or Zone)

Atomicity

What is atomicity?

Atomicity is a desirable characteristic for database transactions. The name comes from the idea of an indivisible “atomic unit”, and it describes a method for processing transactions that treats each transaction as a single, indivisible unit (even if processing the transaction requires multiple steps). In other words, every step of the transaction must complete successfully or the entire transaction will fail and no change will be written to the database.

This is desirable because without atomicity, a transaction that’s interrupted while processing may only write a portion of the changes it’s supposed to make, which could leave the database in an inconsistent state.

It is one of the four ACID properties that are desirable for databases dealing with transactional workloads.

More about Atomicity

Application / Client

What is an application or client?

An application or client is a software program. In the context of a database, the client is what sends data to a database, and/or gets data from it. An example of an application is a phone app – a piece of software that runs on the phones of users, and may re quest and send data to and from a database as the user takes different actions in the app.

More about Application / Client

API

What is an API?

API stands for Application Programming Interface. Simply put, an API is a way for developers/apps/clients to communicate with applications. It’s an interface that allows the developers to send requests to an application and receive simple responses from it.

More about API

Active-Passive Availability

What is Active-Passive Availability?

Active-passive availability is one way to configure a distributed database to offer high availability.

In an Active-Passive Availability setup, all traffic is routed through a single active replica, and changes are copied to a backup passive replica. If the active replica fails, the passive one takes over. However, the active one might fail before the passive one copies all the data over, leading to data loss. Plus, it can take a while for the passive replica to boot up, causing some RTO.

Other configurations include active-active availability and multi-active availability.

More about Active-Passive Availability

Active-Active Availability

What is Active-Active Availability?

Active-active availability is one way to configure a distributed database to offer high availability.

In an active-active availability setup, multiple replicas run identical services and traffic is routed to all of them. If any replica fails, the others handle the extra traffic. This model runs into consistency issues in database contexts because multiple replicas might be trying to edit the same data at the same time.

Other configurations include active-passive availability and multi-active availability.

More about Active-Active Availability

Edge Computing

What is edge computing?

Edge computing is a somewhat overloaded term to describe locality-sensitive distributed computing architecture. Wikipedia defines it as “a distributed computing paradigm that brings computation and data storage closer to the sources of data.” We can think of the “sources of data” being users or even sensors making requests to our system.

The main aim of edge computing or multi-access edge computing is to reduce location-related latency in applications to enable high-performance, real-time use cases in widespread geographies. Edge computing systems are faster when computation and data are closer to the devices.

Developers today are beginning to realize that to get computation data closer to devices, there are better choices than centralized databases and even distributed databases with limited distribution to a single region.

Operating a database in a single region leads to high latency for users or edge devices in areas outside of the database region. Even if you distribute your application across multiple regions, users or devices outside the database region may experience unacceptable response times. And unexpectedly high latency can translate into dissatisfied users.

Edge computing uses data that has life cycles or life spans. In the same application, you can have ephemeral data, in-memory data, short-term persistent data, and long-term persistent data. Typically, long-term persistent data is stored in databases. Unfortunately, when LTP data is far from the edge, it’s slower, and this slow data effect tends to give databases a lousy reputation.

Choosing the right database solution can improve your edge computing architecture and user experience.

More about Edge Computing

Bit

What is a bit?

In the context of data storage, a bit is a single 0 or 1 (also called a binary digit). There are 8 bits in a byte.

More about Bit