Blog
System
40x faster hash joiner with vectorized execution
For the past four months, I've been working with the incredible SQL Execution team at Cockroach Labs as a backend engineering intern to develop the first prototype of a batched, column-at-a-time execution engine. During this time, I implemented a column-at-a-time hash join operator that outperformed CockroachDB's existing row-at-a-time hash join by 40x. In this blog post, I'll be going over the philosophy, challenges, and motivation behind implementing a column-at-a-time SQL operator in general, as well as some specifics about hash join itself.
Angela Chang
January 31, 2019
System
Why we built CockroachDB on top of RocksDB
In September of 2020 we introduced our own homecooked replacement for RocksDB - a storage engine called Pebble. You can read about our reasons for the switch in this blog. We have tremendous respect and appreciation for RocksDB and continue to recommend it. … If, on a final exam at a database class, you asked students whether to build a database on a log-structured merge tree (LSM) or a BTree-based storage engine, 90% of your students would probably respond that the decision hinges on your workload. “LSMs are for write-heavy workloads and BTrees are for read-heavy workloads”, the conscientious ones would write. If you surveyed most NewSQL (or distributed SQL) databases today, most of them are built on top of an LSM, namely, RocksDB. You might thus conclude that this is because modern applications have shifted to more write-heavy workloads. You would be incorrect.
Arjun Narayan
January 17, 2019
System
How Pipelining consensus writes speeds up distributed SQL transactions
CockroachDB supports ACID transactions across arbitrary data in a distributed database. A discussion on how this works was first published on our blog three years ago. Since then, a lot has changed. Perhaps most notably, CockroachDB has transitioned from a key-value store to a full SQL database that can be plugged in as a scalable, highly-available replacement for PostgreSQL. It did so by introducing a SQL execution engine which maps SQL tables onto its distributed key-value architecture. However, over this period of time, the fundamentals of the distributed, atomic transaction protocol at the core of CockroachDB have remained untouched 1.
Nathan VanBenschoten
January 10, 2019
Gotchas and solutions running a distributed system across Kubernetes clusters
``` I recently gave a talk at KubeCon North America -- “Experience Report: Running a Distributed System Across Kubernetes Clusters”. Below is a blog based on that talk for those who prefer to read rather than listen. For anyone interested in viewing the talk, it is available here. ``` If you have run Kubernetes clusters recently, you've probably found that it's a nice way to run your distributed applications. It makes it easy to run even pretty complicated applications like a distributed system. And importantly, it's been drastically improving over the years. New features like dynamic volume provisioning, StatefulSets, and multi-zone clusters have made it much easier to run reliable stateful services. Community innovations like Helm charts have been great for people like me who want to make it easy for other people to run an application they develop on Kubernetes. And for end users, the increasing number of managed Kubernetes services these days make it so that you don't have to run your own cluster. However, the situation hasn't really improved if you want your service to span across multiple regions or multiple Kubernetes clusters. There have been early efforts, such as the Ubernetes project, and the recent Federation v2 project is still ongoing, but nothing has yet solved the problem of running a distributed system that spans multiple clusters. It's still a very hard experience that isn't really documented.
Alex Robinson
December 20, 2018
Performance
AWS outperforms GCP in the 2018 Cloud Report
Our customers rely on us to help them navigate the complexities of the increasingly competitive cloud wars. Should they use Amazon Web Services (AWS)? Google Cloud Platform (GCP)? Microsoft Azure? How should they tune their workload for different offerings? Which is more reliable?
Masha Schneider
December 13, 2018
Performance
CockroachDB 2.1 is now 50x more scalable than Amazon Aurora
[For CockroachDB's most up-to-date performance benchmarks, please read our Performance Overview page] Correctness, stability, and performance are the foundations of CockroachDB. Today, we will demonstrate our rapid progress in performance and scalability with CockroachDB 2.1. CockroachDB is now 50x more scalable than Amazon Aurora at less than 2% of the price per tpmC. And unlike Aurora and other databases that selectively degrade isolation levels for performance, CockroachDB can achieve massive scale while maintaining serializable isolation, protecting your data from fraud and data loss. Read on to see benchmarked metrics that demonstrate that CockroachDB can provide customers an ultra-resilient and highly available database at massive scale.
Andy Woods
November 28, 2018
System
How to do a one-step MySQL import in CockroachDB
Update 3/4/19: This feature is out of beta! Want to learn more? Check out our webinar on migrating MySQL data to CockroachDB. We want to make it easy for users of existing database systems to get started with CockroachDB. To that end, we’re proud to announce that CockroachDB now has beta-level support for importing MySQL database dump files: you can now import your data from the most popular open-source database to our modern, globally-distributed SQL database with a single command.
Roland Crosby
November 15, 2018
System
How we built a cost-based SQL optimizer
Here at Cockroach Labs, we’ve had a continual focus on improving performance and scalability. To that end, our 2.1 release includes a brand-new, built-from-scratch, cost-based SQL optimizer. Besides enabling SQL features like correlated subqueries for the first time, it provides us with a flexible optimization framework that will yield significant performance improvements in upcoming releases, especially in more complex reporting queries. If you have queries that you think should be faster, send them our way! We’re building up libraries of queries that we use to tune the performance of the optimizer and prioritize future work.
Andy Kimball
November 8, 2018
Product
CockroachDB 2.1: Easier migrations and a 5x scalability improvement
CockroachDB was built to help teams scale their applications across the globe without sacrificing SQL’s convenience, power, and data-integrity guarantees. In CockroachDB 2.1, we’ve made it easier than ever to migrate from MySQL and Postgres, improved our scalability on transactional workloads by 5x, and launched a managed offering to help teams deploy low-latency, multi-region clusters with minimal operator overhead.
Nate Stewart
November 1, 2018