Blog
testing
Lessons learned from 2+ years of nightly Jepsen tests
Since the pre-1.0 betas of CockroachDB, we've been using Jepsen to test the correctness of the database in the presence of failures. We have re-run these tests every night as a part of our nightly test suite. Last fall, these tests found their first post-release bug. This blog post is a more digestible walkthrough of that discovery (many of the links here point to specific comments in that issue's thread to highlight the most important moments).
Ben Darnell
February 21, 2019
System
Introducing the High Availability Architecture Guide (CockroachDB vs. Oracle)
Which is worse...? One of your users goes to check her bank balance in your app, and the service is down, or, One of your users goes to check her bank balance in your app and there's a data inconsistency. Engineers are frequently faced with this false tradeoff: do you place a higher premium on data correctness, or high availability? This problem only becomes more complicated when you begin dealing with users distributed across broad geographies. When IT experts consider high availability infrastructure for mission-critical services, their minds often leap to Oracle as the preeminent service provider. But Oracle's database was designed in a pre-cloud world, and the means by which it achieves high availability on geo-distributed workloads are complex. Oracle requires a staggering number of technologies that must be implemented, and still, their solutions can allow potentially costly anomalies into your data. As a cloud native database, CockroachDB introduces a new way of providing always-on availability, strong data consistency, and distributed performance. Today, we're releasing a side-by-side comparison of CockroachDB and Oracle to help you get a better understanding of the architecture (and cost) of setting up a highly available distributed service.
Charlotte Dillon
February 12, 2019
Performance
Reproduction steps now available for the 2018 Cloud Report
CockroachDB is a cloud-neutral database, which means it eliminates dependencies on a particular cloud environment and gives you the flexibility and choice to run it anywhere you like. We are committed to this principle and in order to deliver on this promise, we systematically deploy and test CockroachDB clusters on the three leading US cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.
Andy Woods
February 7, 2019
System
40x faster hash joiner with vectorized execution
For the past four months, I've been working with the incredible SQL Execution team at Cockroach Labs as a backend engineering intern to develop the first prototype of a batched, column-at-a-time execution engine. During this time, I implemented a column-at-a-time hash join operator that outperformed CockroachDB's existing row-at-a-time hash join by 40x. In this blog post, I'll be going over the philosophy, challenges, and motivation behind implementing a column-at-a-time SQL operator in general, as well as some specifics about hash join itself.
Angela Chang
January 31, 2019
System
Why we built CockroachDB on top of RocksDB
In September of 2020 we introduced our own homecooked replacement for RocksDB - a storage engine called Pebble. You can read about our reasons for the switch in this blog. We have tremendous respect and appreciation for RocksDB and continue to recommend it. … If, on a final exam at a database class, you asked students whether to build a database on a log-structured merge tree (LSM) or a BTree-based storage engine, 90% of your students would probably respond that the decision hinges on your workload. “LSMs are for write-heavy workloads and BTrees are for read-heavy workloads”, the conscientious ones would write. If you surveyed most NewSQL (or distributed SQL) databases today, most of them are built on top of an LSM, namely, RocksDB. You might thus conclude that this is because modern applications have shifted to more write-heavy workloads. You would be incorrect.
Arjun Narayan
January 17, 2019
System
How Pipelining consensus writes speeds up distributed SQL transactions
CockroachDB supports ACID transactions across arbitrary data in a distributed database. A discussion on how this works was first published on our blog three years ago. Since then, a lot has changed. Perhaps most notably, CockroachDB has transitioned from a key-value store to a full SQL database that can be plugged in as a scalable, highly-available replacement for PostgreSQL. It did so by introducing a SQL execution engine which maps SQL tables onto its distributed key-value architecture. However, over this period of time, the fundamentals of the distributed, atomic transaction protocol at the core of CockroachDB have remained untouched 1.
Nathan VanBenschoten
January 10, 2019
Gotchas and solutions running a distributed system across Kubernetes clusters
``` I recently gave a talk at KubeCon North America -- “Experience Report: Running a Distributed System Across Kubernetes Clusters”. Below is a blog based on that talk for those who prefer to read rather than listen. For anyone interested in viewing the talk, it is available here. ``` If you have run Kubernetes clusters recently, you've probably found that it's a nice way to run your distributed applications. It makes it easy to run even pretty complicated applications like a distributed system. And importantly, it's been drastically improving over the years. New features like dynamic volume provisioning, StatefulSets, and multi-zone clusters have made it much easier to run reliable stateful services. Community innovations like Helm charts have been great for people like me who want to make it easy for other people to run an application they develop on Kubernetes. And for end users, the increasing number of managed Kubernetes services these days make it so that you don't have to run your own cluster. However, the situation hasn't really improved if you want your service to span across multiple regions or multiple Kubernetes clusters. There have been early efforts, such as the Ubernetes project, and the recent Federation v2 project is still ongoing, but nothing has yet solved the problem of running a distributed system that spans multiple clusters. It's still a very hard experience that isn't really documented.
Alex Robinson
December 20, 2018
Performance
AWS outperforms GCP in the 2018 Cloud Report
Our customers rely on us to help them navigate the complexities of the increasingly competitive cloud wars. Should they use Amazon Web Services (AWS)? Google Cloud Platform (GCP)? Microsoft Azure? How should they tune their workload for different offerings? Which is more reliable?
Masha Schneider
December 13, 2018
Performance
CockroachDB 2.1 is now 50x more scalable than Amazon Aurora
[For CockroachDB's most up-to-date performance benchmarks, please read our Performance Overview page] Correctness, stability, and performance are the foundations of CockroachDB. Today, we will demonstrate our rapid progress in performance and scalability with CockroachDB 2.1. CockroachDB is now 50x more scalable than Amazon Aurora at less than 2% of the price per tpmC. And unlike Aurora and other databases that selectively degrade isolation levels for performance, CockroachDB can achieve massive scale while maintaining serializable isolation, protecting your data from fraud and data loss. Read on to see benchmarked metrics that demonstrate that CockroachDB can provide customers an ultra-resilient and highly available database at massive scale.
Andy Woods
November 28, 2018