[CASE STUDY]
downtime
regions
multi-active architecture
Before using CockroachDB, City Storage Systems’ built applications on GCP Cloud SQL for PostgreSQL which requires maintenance windows for many standard operations. City Storage Systems did not like having frequent downtime for maintenance and wanted a no downtime solution for Tier 0 applications.
They wanted to build a storage-as-a-system service that would allow app developers to move fast – this meant delivering a SQL interface that’s easy to understand, strongly consistent, and easy to scale. What City Storage Systems didn't want was developers getting bogged down worrying about eventual consistency, especially when scaling across regions. Additionally, they wanted to avoid being locked in to a single cloud provider, while having the ability to deploy their infrastructure across multiple clouds in the future.
In summary, their requirements for a new database included:
Ability to tolerate the outage of an entire region
Cloud-native foundation, with no vendor lock-in
Flexibility to scale up and down on demand
Portability and multi-cloud from day one
Quick development speed with strongly consistent storage and no downtime for maintenance
After they evaluated several available database solutions, the City Storage Systems team felt that CockroachDB was the most mature product that met all of their requirements.
It’s a design goal of ours to make sure that we have the option of running a multi-cloud deployment, or at the very least portability across clouds. It’s great to be able to say, ‘We are just going to take this stuff and move it due to regional availability.’ Not many organizations have that capability.
-Rasmus Bach Krabbe, Engineering Manager, City Storage Systems
Once City Storage Systems selected CockroachDB to support their Tier 0 workloads, they started to migrate applications over from PostgreSQL. They still run PostgreSQL for smaller use cases and backend apps that don’t require the advanced capabilities of CockroachDB.
For example, they migrated their order management system and user authentication platform to CockroachDB. They wanted these applications to run across multiple regions and be highly available at all times.
These systems also make up the bulk of their online transaction processing (OLTP) load. For example, their order management system can see spikes in volume based on certain time periods when more people are placing more orders. If this happens, the Infrastructure Storage Team can add more nodes to the cluster and scale up. When the demand levels out, they can scale back down.
The migration process itself required work – especially since City Storage Systems wanted to do it without downtime. At a high level, they built a system that supported dual writes, synchronized the data across the two databases, and at some point flipped over the source of truth. Now the team can leverage CockroachDB’s suite of migration tools, called MOLT, to make the migration process much easier for any additional use cases that require zero downtime.
The ability to have zero downtime restarts with CockroachDB is huge. We can do actual maintenance of the nodes without disturbing our customers and having them know that we are even doing it. Running a multi-active data center gives us the type of survivability we are looking for.
-Frederik Stenum Mogensen, Software Engineer, City Storage Systems
City Storage Systems builds their applications on Kubernetes for some of the same reasons they build on CockroachDB. Kubernetes…
Is programmable so they don’t have standard ops org to run many clusters
Facilitates high availability by automating application development and restarts
Is portable and doesn’t lock them into a specific cloud vendor
Can speed up development because many devs are already familiar with it
In fact, CockroachDB is built to work on Kubernetes and that compatibility helped make the development process smoother than it might otherwise have been. However, that doesn’t mean achieving an ideal setup with minimal operational upkeep was easy.
After many trials and tribulations, the City Storage Systems team decided to build their own custom operator. By using the operator, they were able to free up significant time by simplifying and automating a variety of tasks. The diagram below shows the architecture of the operator on the left, and a sample of a YAML file defining resources and hierarchies on the right.
The team runs multiple in-house orchestrations for the CockroachDB clusters to make them configurable by code instead of ClickOps. This helps keep the clusters reliable, consistent across environments, and tolerant of the disruptions that are inherent to Kubernetes.
For a majority of their Tier 0 applications, they are running CockroachDB in Kubernetes across three U.S. regions (in a single AZ per region) for high availability. For some of their smaller, non-mission critical applications, they run CockroachDB in a single region which meets the availability requirements.
After an initial successful deployment via one cloud provider, in 2022 City Storage Systems made the decision to move CockroachDB to Microsoft Azure, aligning to the company's broader investment in Azure to take advantage of operational efficiencies.
Additionally, the company was keen on establishing a partnership, similar to CockroachDB, in order to help evolve CSS' stack and Azure together. These business objectives were easily supported by CockroachDB's flexible cloud-native architecture that enables seamless migration between cloud platforms.
The team reports that the cloud migration process was made possible via two major components: CockroachDB itself and the Kubernetes control plane.
As the City Storage Systems team was removing CockroachDB nodes from the original cloud provider, they were able to drain data across nodes and control data placement via CockroachDB node attributes. They would then spin back up the nodes in Azure and move the data. The Kubernetes control plane gave them a single pane of glass experience across the two cloud providers that abstracted away many of the details.
The cloud migration we performed was actually very smooth for our use case. We now run CockroachDB on Azure across multiple regions and are able to tolerate the loss of an entire region.
-Rasmus Bach Krabbe, Engineering Manager, City Storage Systems
CockroachDB helped City Storage Systems address modern application challenges so that they could deliver on their business requirements. First, CockroachDB enables them to build a resilient foundation that can tolerate region-wide outages, ensuring their critical Tier 0 applications remain operational.
Additionally, CockroachDB allows City Storage Systems to scale their database to handle an increase in demand for applications like their order management system. CockroachDB’s familiar SQL interface and strong consistency model have simplified application development and freed developers from concerns about eventual consistency.
Not only have they been able to increase developer productivity, but they’ve also been able to simplify operations. CockroachDB features like zero downtime upgrades and online schema changes have reduced the operational overhead for City Storage Systems, allowing them to perform maintenance without disrupting their customers. They were also able to seamlessly migrate to Azure from another cloud provider since CockroachDB is cloud-native.
Today, CockroachDB, in tandem with a custom Kubernetes operator running on Azure, is delivering what City Storage Systems needs: a highly-available, strongly consistent database with multi-region and multi-cloud support that remains performant at scale.
In the future, like many startups, City Storage Systems will introduce automation where possible, leverage new technologies to improve performance, and push GenAI further into their use cases. For example, they have created an advanced food prep prediction model that leverages unique data to predict food prep time 42% more accurately than the baseline. To learn more about what the team is building, subscribe to their tech blog.
If you’d like to run CockroachDB on Azure, visit the Azure Marketplace or learn more about our partnership here.