[ INDUSTRY ]
Fintech
[ CHALLENGE ]
Building a cloud-native, enterprise-grade payment technology using microservices and Kubernetes.
[ SOLUTION ]
A highly scalable, adaptable, resilient, multi-cloud platform built on CockroachDB.
PUBLIC CLOUD PROVIDERS
AVAILABILITY
PAYMENT SYSTEM INTEGRATIONS
< 5 MINUTES
Founded in the UK, Form3 started its business to cater to banks and fintechs that do business in the UK and EU (and, recently, the US market). Within these regions there are five different payment schemes that must be integrated. While they all have the same function — transferring money — the implementation, requirements, and payment schemes are all different for each one. What’s more, these schemes are constantly changing.
For example, when connecting to the UK’s real-time Faster Payment System (FPS), companies are required to have leased lines, physical HSMs, and other hardware, which means they cannot operate exclusively in the cloud. This is a challenge especially for digital-first mobile banks: to integrate to FPS, they would be required to have their own equipment, connect to physical data centers, and continually manage the integration’s frequent changes. Form3’s platform handles all of this complexity for them, satisfying all related regulatory requirements.
Beyond reconciling different regional payment schemes, Form3 must also adhere to a variety of regulations and legislation. Several large financial institutions are well on their way to becoming cloud-first and regulators are nervous about these companies all choosing the same cloud vendor. In the UK, there is growing concern about what would happen if the country’s major banks were all running on the same public cloud provider. One public cloud provider’s unfortunate event — a major outage, sudden steep price hikes, etc. — could have dire consequences for the entire UK economy.
Form3 originally selected AWS as their cloud provider, and started off building on Amazon RDS for PostgreSQL, which initially worked well for their needs. They liked that PostgreSQL was reliable and could deliver strong data consistency for payments.
Given all the emerging regulations, they thought there was a better solution: Don’t depend on any single cloud provider. Instead, run across multiple clouds at the same time. This way the platform would be able to survive even a full cloud provider outage. Going multi-cloud, however, meant changing to a database that could run across all three major cloud providers, as well as hybrid/on-prem.
Cloud-neutral, supports multi-cloud
SQL compatibility
Easy to run and maintain
Strongly consistent
Scalable
Easy to run with Kubernetes
Distributed databases like Aurora and Spanner met a lot of these requirements, but they are both tied to a particular platform vendor and are not cloud neutral. Other databases that provided the necessary scale and performance (like Cassandra and Redis) didn’t offer data consistency. Ultimately, satisfying their full set of requirements led Form3 to CockroachDB.
CockroachDB is cloud-agnostic, check. Comes with built-in horizontal scalability and low latency, check. Fulfills the consistent storage requirements, check. Finally, CockroachDB can natively run across Kubernetes clusters. Very few relational databases have the same distributed nature as Kubernetes, and running a non-distributed SQL database on Kubernetes can make deployment and management difficult.
It's the best of NoSQL in terms of horizontal scale, and the best of relational in terms of ACID compliance and write consistency so we don’t lose any payments. It’s the only offering we saw on the market that really solves that problem. And coupled with the fact that it’s super easy to run in Kubernetes, CockroachDB became an easy choice.
-Kevin HolditchHead of Platform Engineering, Form3
Even though Form3 wanted to run completely in the cloud, they knew that they had to have a combination of their own data centers and public clouds in order to connect with certain payment schemes. After choosing to migrate their FPS access solution to CockroachDB, they began to think about a different deployment methodology for their new platform that would satisfy regulatory requirements for operational resilience. They needed a second cloud.
They thought they could have replicated the entire platform onto GCP, which has a matching set of technologies to AWS (i.e., instead of SQS they use PubSub). The problem with this approach is that they would be managing two versions of their platform that have different behaviors. This was fairly tough to do and would create a lot of maintenance overtime.
Instead, they decided to run the whole database across all three cloud providers: AWS, GCP, and Azure. They use each vendor’s managed Kubernetes control plane offering, so that the vendor runs Kubernetes. This leaves Form3’s engineers free to concentrate on the value-add for the business:running their software and configuring CockroachDB on top of it.
For their setup, their data has a replication factor of three. Each range is replicated across each cloud and they are able to maintain strong consistency on each range. Here’s what it looks like:
With this deployment model in place, they are able to survive a full cloud provider outage. If a cloud goes down, they still are able to process payments with the same SLAs. They have pretty strict performance requirements when it comes to real-time payment processing with the SLAs at 100 and 200 ms at a P99 latency.
The SLAs also require that they can process different types of requests in a very short period of time. They have an incredibly high number of concurrent requests – up to 700 TPS – coming into the system at the same time. The requests are transient and involve cross cloud communication.
Using CockroachDB gives them the ability to configure different topologies based on where customers are located. For example, if you have a customer in London, you set it up so CockroachDB fetches data from the closest node to their London location and you get great performance. If that region goes down, though, there’s still a replicated and up-to-date copy of that data in a different location and on a different cloud provider.
Form3’s multi-cloud deployment is advanced. They had to privately network all the clouds together and maintain network connectivity through multiple datacenters. Networking across clusters is not simple, but CockroachDB’s operational simplicity can make it easier. By leveraging VPC peering and assigning Kubernetes pods their own IP address, Form3 can point CockroachDB nodes at one another and the pods will end up talking to each other from different clusters.
When you are federating Kubernetes clusters, if a pod goes down it typically takes other pods down with it. However, CockroachDB allows you to federate pods at the data layer. This means the clusters are running independently and so don’t really “know” about each other. Because they are isolated in this way, they won’t take each other down. This helps mitigate a great deal of risk.
CockroachDB is doing some amazing gymnastics under the bonnet. You might think it's just a simple query, but it's actually going off to different nodes where the data is physically stored to retrieve it. The performance is really good. And then you have this tremendous scaling capability. Using CockroachDB almost feels a bit magic.
-Kevin HolditchHead of Platform Engineering, Form3
It’s worth noting that because Form3 is running across multiple clouds, managing multiple cloud backups becomes extremely challenging. Form3 has very strict RPO requirements which require them to do incremental backups every 5 minutes. Initially they tried using CockroachDB’s locality-aware backup feature, but it didn’t work for their multi-cloud environment. The team collaborated with the Cockroach Labs team to find a solution specific to their needs, and this ultimately became a new feature called locality-restricted backups.
Locality-restricted backups allow Form3 to do staggered backups in two clouds. That way, if a cloud outage happens, a backup will still run from one cloud and they would have the option to restore from the remaining cloud. While they don’t expect backups to fail, they have observability and monitoring in place as a precaution.
CockroachDB is not just a component of this platform, it’s the whole backbone of our payment processing engine. Luckily it is very reliable and very scalable so I can go to sleep knowing we won’t be paged overnight about an issue with a cluster.
-Rogger FabriLead Engineer, Form3
When it came to migrating from PostgreSQL to CockroachDB, it was a “fairly easy process” for the Form3 team. Since CockroachDB is PostgreSQL wire-compatible, Form3’s data structures and queries just worked — only a few needed tweaking. Another factor that helped streamline the migration: Most of the Form3 platform is written in Go. A majority of existing PostgreSQL tools also work with CockroachDB, so the team could leverage the Go drivers they were already familiar with. Finally, CockroachDB also has a Kubernetes Helm chart available, packaged up nicely for an easy install.
The Form3 team recommends setting up a good test suite and running end-to-end tests that include the database — they even tried dropping the database connection to ensure that the software handleds the failure correctly. Finally, before completing the migration, they tested latency to make sure that their payment journeys were rock solid and met all SLAs.
The Form3 team also highly recommends leveraging managed Kubernetes; otherwise, you might end up needing an army of experts to run and maintain the platform. There’s no competitive advantage to managing it yourself. If Kubernetes goes down, Form3 has enterprise support with the cloud vendor so they can get someone to help. And, since they are using CockroachDB, they are not locked into a single provider, which does give them a competitive edge.
Now that the migration is complete, Form3 continues to improve its platform and engages in a practice called chaos engineering. They use a tool called Chaos Monkey to simulate the failures. They have scenarios that create cluster failures, cloud outages, and disaster recovery failures to see how the platform responds.
Through chaos engineering, we are pushing CockroachDB to its limits and it recovers by itself. That’s one of the best things an engineer can see.
-Rogger FabriLead Engineer, Form3
Form3 is on a mission to become the world’s most trusted provider of payment technology. They remove the burden of managing costly, and ever-evolving, critical payment infrastructure from their customers, which allows customer to seamlessly add payment schemes as they expand into new regions. With CockroachDB as a crucial part of their platform’s foundation, Form3 can navigate emerging legislations and continue to grow their business.
Today, Form3’s API-based cloud technology solutions are entrusted by financial institutions, such as Lloyds Bank and Nationwide Building Society, for delivering mission-critical payments. For now the majority of Form3’s customers are still based in the UK and the EU. However, they’ve already expanded operations to the US and are looking to enter other international markets — a quickly achievable goal when you’re using a single logical database across all three cloud providers.
Spin up your first cluster in minutes. Start with $400 in free credits.