Blog
Engineering
What's new in CockroachDB’s cost-based query optimizer
In 2018, CockroachDB implemented a cost-based query optimizer from scratch, which has been steadily improved in each release. The query optimizer is the part of the system that understands the semantics of SQL queries and decides how to execute them; execution plans can vary wildly in terms of execution time, so choosing a good plan is important. In this post we go over some of the optimizer-related improvements in CockroachDB v20.1.
Radu Berinde
July 16, 2020
Engineering
Improving unordered distinct efficiency in the vectorized SQL engine
For the past four months, I’ve worked as a backend engineering intern with the incredibly talented engineers of the Cockroach Labs SQL Execution team to unlock the full potential of the vectorized engine ahead of the CockroachDB 20.1 release. During this time, I focused on improving the algorithm behind some of CockroachDB’s SQL operators, to reduce memory usage and speed up query execution. In this blog post, I will go over the improvements that I made to the algorithm behind the unordered distinct operator, as well as new algorithms that address some existing inefficiencies.
Archer Zhang
July 9, 2020
Engineering
Disk spilling in a vectorized execution engine
Late last year, we shipped v1 of our vectorized execution engine. It enables column-based query execution and speeds up complex joins and aggregations, improving analytical capabilities in CockroachDB (which is first and foremost optimized for OLTP workloads). v1 of the engine didn’t support disk spilling, which meant it couldn’t execute certain memory-intensive queries if there was not enough memory available. Starting in CockroachDB v20.1, these queries fall back to disk (also known as “spilling” to disk).
Alfonso Subiotto Marques
June 30, 2020
Engineering
When and why to use SELECT FOR UPDATE in CockroachDB
We didn’t implement SELECT FOR UPDATE to ensure consistency. Unlike Amazon Aurora, CockroachDB already guarantees serializable isolation, the highest isolation level provided by the ANSI SQL standard. Contention happens. And application developers shouldn’t need to risk the integrity of their data when transactions contend. To combat this, CockroachDB must occasionally return errors prompting applications to retry transactions that would risk anomalies, such as write skews. This means that just like with PostgreSQL in serializable isolation, developers need to implement retry-loops for contended transactions. For developers accustomed to relational databases with lower isolation levels, this can be an unfamiliar pattern.
John Kendall
June 22, 2020
Engineering
Nested transactions in CockroachDB 20.1
CockroachDB 20.1 introduces support for nested transactions, a SQL feature which simplifies the work of programmers of certain client applications. In this post, we'll explore when and where to use nested transactions, when not to use them, and how they work in a distributed environment.
Raphael Kena Poss
June 15, 2020
Engineering
SIGMOD 2020: Cockroach Labs publishes research paper on CockroachDB
Over the past few months, a team of our engineers, technical writers, product managers, and sales engineers codified the research and learnings of CockroachDB and are now contributing this knowledge back into the very system from which we have benefited with the hope of further advancing distributed systems research and design. The research paper, "CockroachDB: The Resilient Geo-Distributed SQL Database", is a labor of love that we are honored to have published by SIGMOD, the Association for Computing Machinery's (ACM) Special Interest Group on Management of Data, which specializes in large-scale data management problems.
Jessica Edwards
June 10, 2020
Engineering
How online primary key changes work in CockroachDB
As of our 20.1 release, CockroachDB supports online primary key changes. This means that it is now possible to change the primary key of a table while performing read and write operations on the table. Online primary key changes make it easier to transition an application from being single-region to multi-region, or to iterate on the schema of an application with no down time. Let’s dive into the technical challenges behind this feature, and our approach to solving them. As part of the deep dive, let’s review how data is stored in CockroachDB, and how online schema changes work in CockroachDB.
Rohan Yadav
May 21, 2020
Engineering
Tutorial: How to build a low-latency Flask app that runs across multiple regions
If your company has a global customer base, you’ll likely be building an application for users in different areas across the world. Deploying the application in just a single region can make high latencies a serious problem for users located far from the application’s deployment region. Latency can dramatically affect user experience, and it can also lead to more serious problems with data integrity, like transaction contention. As a result, limiting latency is among the top priorities for global application developers. In this blog, we walk you through a low-latency multi-region application that we built and deployed for MovR, a fictional vehicle-sharing company with a growing user base. The MovR application is a Flask web application, connected to a CockroachDB cluster with SQLAlchemy. We deployed the application in multiple regions across the US and Europe, using Google Kubernetes Engine, with some additional Google Cloud Services for load balancing, ingress, and domain hosting. For the database, we used a multi-region CockroachDB Dedicated deployment on GCP. The application source code is available on GitHub, and an end-to-end tutorial for developing and deploying the application is available on our documentation website.
Eric Harmeling
April 2, 2020
Engineering
How to run chaos tests in a multi-cloud environment
This year, as every year, Black Friday and Cyber Monday stressed e-commerce systems to their breaking points. Major companies like H&M, Nordstrom Rack, and other retailers experienced the kinds of costly outages that keep SREs up at night. Multi-cloud infrastructure is sometimes offered as a panacea to these kinds of outages. But multi-cloud deployments are not a band-aid. In fact, they often introduce new complexities into the system that need to be sniffed out. But sniffing out bugs in multi-cloud environments is, by nature, complicated. Ana Medina, a chaos engineer from Gremlin, spoke at ESCAPE/19 about how to do it, including a detailed list of the kinds of errors to search for and checklists of questions to ask.
Dan Kelly
December 9, 2019