Production deployments are a world apart from development and testing environments. They come with their own best practices and recommendations, usually customized for each piece of your software stack. In this post, we’ll examine some of the more critical decisions to be made when deploying CockroachDB in production.
Security
CockroachDB provides two diametrically-opposed security modes governed by the flag --insecure
. Let’s see what happens when we use it:
$ cockroach start --insecure
*
* WARNING: RUNNING IN INSECURE MODE!
*
* - Your cluster is open for any client that can access <all your IP addresses>.
* - Any user, even root, can log in without providing a password.
* - Any user, connecting as root, can read or write any data in your cluster.
* - There is no network encryption nor authentication, and thus no confidentiality.
*
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/stable/secure-a-cluster
*
This is meant to scare you and I hope it worked as intended. It does highlight the difference between our two modes:
Encryption
Enabling secure mode changes all network communication to be done through TLS. All traffic between CockroachDB nodes as well as client-server communications end up encrypted. You may start off thinking you have no need for in-flight encryption because your datacenter is secure and you have strict controls in place for which clients can even access your CockroachDB nodes. But before long you find yourself needing to add firewall rules for ad-hoc services running outside your carefully-controlled environment, or you decide to expand your CockroachDB cluster to another datacenter, communicating over untrusted network links. Without encryption, you now find yourself sending plaintext data over networks your are not in full control of. In this day and age we probably don’t need to elaborate on the need for encrypted communication.
Authentication
Authentication is the act of verifying the identity of the other party in a communication. In a secure CockroachDB deployment, we perform authentication in a few places:
CockroachDB nodes have the right client certificate
All nodes should have a client certificate for the special user node
. This is to restrict access to the node-to-node protocol.
Node addresses are in the server certificate
The address (IP address or DNS name) used to reach a node, either directly or through a load balancer, should be listed in the server certificate presented by the node. This is needed to make sure we are indeed talking to a CockroachDB node, and not a man-in-the-middle.
Client is who it says it is
We need to verify that a client does have the right to act as the requested SQL user. This is done either through certificates (by checking against the Common Name
), or through password authentication.
These checks allow CockroachDB to restrict access to authorized clients and make sure that nodes cannot be impersonated. You may again find yourself in a firewalled and strictly-controlled environment and think you may not need authentication. You trust your users to connect using the correct SQL user, and you are not concerned with node impersonation. However, you will soon find that your (trusted and honest) internal users start connecting as root
on an ad-hoc basis. From there, it’s only a matter of time until an unfortunate typo drops one of your more critical tables. Without authentication you will be at the mercy any attacker getting through into your controller environment, and you will be at the mercy of carelessness on the part of your own users.
Monitoring & Alerting
Another critical aspect of running anything in production (be it your database, monolithic server, or micro service) is monitoring. You should not be waiting for user complaints to know that your system is experiencing problems. CockroachDB comes with a built-in Admin UI showing you high-level metrics about various aspects of the system. While these are helpful in examining the current workload on your cluster, there is one thing it cannot do: alerting. The reason for this is quite simple: the worst of cases is when most or all of the cluster is down. Because the Admin UI is built into CockroachDB, it could not possibly alert you if your cluster were down. To solve this problem, we provide metrics in a format understood by Prometheus. Besides recording all metrics, Prometheus allows you to write rules for alerts to send to the AlertManager. In turn, AlertManager can notify you in any number of ways (email, slack, pagerduty, etc…) and make sure you are promptly notified of cockroach issues.
We’ve hopefully convinced you of the need to run CockroachDB in secure mode, as well as setup monitoring. While a lot more goes into a good production deployment (provisioning, tooling, backups, etc.), security and monitoring are two of the early decisions you should not skip.
Illustration by Jared Oriel