CockroachDB is pretty easy to deploy. We’ve done our best to avoid the need for configuration files, mandatory environment variables, and copious command line flags, and it shows; we’ve already had testimonials from folks who were able to deploy a 20-node cluster in just half a day. That’s something to be proud of!
However, there is still one wrinkle in the fabric, and that’s our use of network ports. As of this writing, CockroachDB requires two ports, but why, and can we do better?
CockroachDB started out as a distributed key-value store (read: NoSQL). In those days, all internode communication used golang’s standard net/rpc
. This worked quite nicely thanks to net/rpc.(*Server)
implementing http.Handler
; that implementation meant that we could multiplex our admin UI with our RPC traffic on a single port:
package main
import (
"log"
"net/http"
"net/rpc"
)
type Args struct {
A, B int
}
type Arith struct{}
func (*Arith) Multiply(args *Args, reply *int) error {
*reply = args.A * args.B
return nil
}
func main() {
s := rpc.NewServer()
s.Register(&Arith{})
mux := http.NewServeMux()
mux.Handle("/", adminUIHandler)
mux.Handle(rpc.DefaultRPCPath, s)
log.Fatal(http.ListenAndServe(":0", mux))
}
Great, we’re done…right? Almost. As we’ve previously discussed on this blog, CockroachDB is now a relational SQL database, and because we’re not masochists, we decided to implement PostgreSQL’s wire protocol (PGWire) rather than write and maintain client libraries for every language. This presents a problem: we need another port. But why? net/rpc
can share, why can’t we? turns out the answer is “by definition”; we can’t piggyback on net/http
because PGWire isn’t HTTP. The net/rpc
protocol is specially designed to support this use case – its handshake occurs over HTTP, after which the connection is http.Hijacker
ed and only then passed to the net/rpc.(*Server)
, which proceeds over plain TCP. Since PGWire wasn’t designed with these considerations in mind, it is implemented in plain TCP end-to-end. Bummer.
Astute readers will note that HTTP and PGWire are different protocols! In other words, it should be possible to read a few bytes from an incoming connection, figure out which protocol is being used, and delegate to the appropriate handler. This is exactly what we did using a small library called cmux
. cmux
provides a custom net.Listener
implementation which supports lookahead on its connections, along with the notion of “matchers” – boolean functions used to produce “child” listeners which yield only those connections for which the matcher returns true
.
CockroachDB supports TLS in both HTTP/RPC and PGWire, but the two protocols implement TLS differently. In HTTP the TLS handshake is the first thing sent on the connection (which is handled by tls.NewListener
, while PGWire has a brief cleartext negotiation phase before encryption starts. Therefore, we use the following arrangement of listeners:
non-TLS case:
net.Listen -> cmux.New -> pgwire.Match -> pgwire.Server.ServeConn
|
- -> cmux.HTTP2 -> http2.(*Server).ServeConn
- -> cmux.Any -> http.(*Server).Serve
TLS case:
net.Listen -> cmux.New -> pgwire.Match -> pgwire.Server.ServeConn
|
- -> cmux.Any -> tls.NewListener -> http.(*Server).Serve
Phew! That was a lot. Time to pat ourselves and have a drink, our database is running on one port.
Have you heard of gRPC? It’s pretty nice. It’s built on HTTP2, so it supports such nicities as streaming RPCs, multiplexed streams over a single connection, flow control, and more. It’s also a great fit for our Raft implementation because it allows us to avoid blocking normal Raft messages behind (slow) snapshots by using different streams for the two types of traffic.
Just a few weeks after the cmux work landed, we switched our RPC system to gRPC. This worked pretty well since grpc.(*Server)
also implements net/http.Handler
.
Note: While grpc-go
supports insecure gRPC (that is, h2c), Go’s net/http
does not. This means that using grpc.(*Server).ServeHTTP
requires some trickery – here’s our solution, and its later refinement using cmux.
OK, let’s get back to the story. Unfortunately for us, grpc.(*Server).ServeHTTP
turned out to have serious performance problems, so we had to get creative. Fortunately, gRPC identifies itself through a content-type header (it’s HTTP2, remember?), which means we can use cmux to sniff out the header and then dispatch to the much faster grpc.(*Server).Serve
, so we did that:
non-TLS case:
net.Listen -> cmux.New
|
- -> pgwire.Match -> pgwire.Server.ServeConn
- -> cmux.HTTP2HeaderField("content-type", "application/grpc") -> grpc.(*Server).Serve
- -> cmux.HTTP2 -> http2.(*Server).ServeConn
- -> cmux.Any -> http.(*Server).Serve
TLS case:
net.Listen -> cmux.New
|
- -> pgwire.Match -> pgwire.Server.ServeConn
- -> cmux.Any -> tls.NewListener -> cmux.New
|
- -> cmux.HTTP2HeaderField("content-type", "application/grpc") -> grpc.(*Server).Serve
- -> cmux.Any -> http.(*Server).Serve
A little more complicated, but it worked, and allowed us to keep our straight-line performance from regressing too badly.
That is, until we discovered that this broke our admin UI when using TLS. The TL;DR is that gRPC behaves differently from most HTTP2 clients (including Chrome); where those clients wait for an acknowledgement from the server before sending headers, gRPC does not, which is why the solution above worked in the first place. However, when serving Chrome over TLS (HTTP2 is not used unless TLS is used as well), our cmux matcher hangs as it waits for the headers to come through (cmux matchers may not write, only “sniff”), which causes Chrome’s cryptic error.
This is where we are today. We’ve had to separate our HTTP port from the gRPC+PGWire port, where the diagram for gRPC+PGWire is now:
non-TLS case:
net.Listen -> cmux.New
|
- -> pgwire.Match -> pgwire.Server.ServeConn
- -> cmux.Any -> grpc.(*Server).Serve
TLS case:
net.Listen -> cmux.New
|
- -> pgwire.Match -> pgwire.Server.ServeConn
- -> cmux.Any -> grpc.(*Server).Serve
and the HTTP admin UI has its own dedicated port. This is an interim solution; we hope that the gRPC maintainers are able to fix the performance problems in grpc.(*Server)
, which will allow us to return to a single port.
Until then, we’ll have one port to rule them all – and a second port for the Admin UI.