A request came across my desk the other day asking whether I had any experience with Django and could I get it to work with CockroachDB’s multi-region capabilities. I had never heard of Django before but I’m never one to turn down a challenge, so I accepted.
First of all, let’s explain what Django is and what it is for.
What is Django and what are Django use cases?
Django is a web framework that encourages rapid development and clean, pragmatic design of Python applications. It does this by providing a proven pattern for designing scalable web applications with many features available straight out the box. This helps to prevent developers from spending unnecessary time ‘reinventing the wheel’ so they can spend more time coding the application required. By starting a Django project you are provided with the basic layout of a web application that you can start to build upon. However, this isn’t a Django tutorial so we will skip forward to the topic at hand: how can we use CockroachDB’s multi-region capabilities with Django?
As you scale your usage of multi-region clusters, you may need to keep certain subsets of data in specific localities. Keeping specific data on servers in specific geographic locations is also known as data domiciling. CockroachDB has basic support for data domiciling in multi-region clusters using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement.
To follow along with the blog you will need three Kubernetes clusters. Ideally, these would be located in separate regions but that’s not essential. You will also need a basic understanding of Docker and Kubernetes.
Being a Django and Python newbie I opted to use the example Django application available here in the CockroachDB docs. This is a simple application for inserting Customers, Products, and Orders into a CockroachDB database.
The up-to-date multi-region code for this blog can be found here.
To demonstrate the multi-region capabilities of CockroachDB I will be updating the capture Customer python function from the Django example application to record which cloud provider the customer should be domiciled to. For example, if customer ‘Mike’ was posted from AWS then Mike’s customer record should remain on the nodes in that locality.
How to make a multi-region Python application
Several updates are required for the application to accept an additional field to record the cloud into the database. A number of changes need to be made, the first is to change the model.py
file to add the additional field.
class Customers(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False)
name = models.CharField(max_length=250)
cloud = models.CharField(max_length=250, null=True)
Update views.py
to accept the new field.
def post(self, request, *args, **kwargs):
form_data = json.loads(request.body.decode())
name, cloud = form_data['name'], form_data['cloud']
c = Customers(name=name, cloud=cloud)
c.save()
return HttpResponse(status=200)
Change settings.py
to have your database configuration.
DATABASES = {
'default': {
'ENGINE': 'django_cockroachdb',
'NAME': 'django',
'USER': 'user',
'PASSWORD': 'password',
'HOST': 'cockroachdb-public',
'PORT': '26257',
# If connecting with SSL, include the section below, replacing the
# file paths as appropriate.
'OPTIONS': {
'sslmode': 'verify-full',
'sslrootcert': '/certs/ca.crt',
# Either sslcert and sslkey (below) or PASSWORD (above) is
# required.
# 'sslcert': '/certs/client.root.crt',
# 'sslkey': '/certs/client.root.key',
},
},
}
And finally, add the additional field into the migration in the 0001_inital.py
file.
operations = [
migrations.CreateModel(
name='Customers',
fields=[
('id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
('name', models.CharField(max_length=250)),
('cloud', models.CharField(max_length=250, null=True)),
Now that we have an application that is ready to deploy we need to prepare our CockroachDB cluster. The first thing we need to do is to create a database for the application to consume. Because my CockroachDB cluster is deployed on Kubernetes I will deploy a secure pod with the correct certificates to connect and create a database called Django.
CREATE DATABASE django;
Now that we have a database, we can deploy our application in each of our regions. By doing this, Django will create all the required databases tables etc. Again, because I am using Kubernetes I will just deploy the manifest that is in the git repository above. Make sure that you set the context and deploy to the correct namespace.
kubectl apply -f ./kubernetes/deployment.yaml
Once the application is deployed and the load balancer service has been created we can retrieve the external IP or Hostname in the case of AWS to post our data to. Here I have set an environment variable for each of my contexts and each of my namespaces.
az_app_ip=$(kubectl get svc django-service --context $clus1 --namespace $azregion -o json | jq -r '.status.loadBalancer.ingress[0].ip')
aws_app_ip=$(kubectl get svc django-service --context $clus2 --namespace $aws_region -o json | jq -r '.status.loadBalancer.ingress[0].hostname')
gcp_app_ip=$(kubectl get svc django-service --context $clus3 --namespace $gcp_region -o json | jq -r '.status.loadBalancer.ingress[0].ip')
Use the simple API of the application to add three entries into the Database. You will notice the second field is ‘cloud’ with a different value to indicate the cloud it was deployed into.
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Carl", "cloud":"azure"}' http://$az_app_ip:8000/customer/
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Mike", "cloud":"aws"}' http://$aws_app_ip:8000/customer/
curl --header "Content-Type: application/json" \
--request POST \
--data '{"name":"Dan", "cloud":"gcp"}' http://$gcp_app_ip:8000/customer/
Now that we have some data in our django database inside CockroachDB, we can turn our attention to the multi-region capabilities.
To enable multi-region configurations a few simple steps need to be performed. First of which is to set the primary region for the database and then add the additional regions. In my case this was uksouth in Azure as the primary, then eu-west-1 in AWS and europe-west4 in GCP.
ALTER DATABASE django PRIMARY REGION "uksouth";
ALTER DATABASE django ADD REGION "eu-west-1";
ALTER DATABASE django ADD REGION "europe-west4";
For the cockroach_example_customers table we want to locate the data based on the value in the cloud column. This means that the right table locality for optimizing access to the data is REGIONAL BY ROW. These statements use a CASE statement to put data for a given cloud in the right region.
ALTER TABLE cockroach_example_customers ADD COLUMN region crdb_internal_region AS (
CASE WHEN cloud = 'aws' THEN 'eu-west-1'
WHEN cloud = 'azure' THEN 'uksouth'
WHEN cloud = 'gcp' THEN 'europe-west4'
END
) STORED;
ALTER TABLE cockroach_example_customers ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE cockroach_example_customers SET LOCALITY REGIONAL BY ROW AS "region";
Next, run a replication report to see which ranges are still not in compliance with your desired domiciling.
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
Next, run the query suggested in the Replication Reports documentation that should show which database and table names contain the violating_ranges.
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
You should see that the cockroach_example_customers table contains violating ranges. Now we can enable the placement restrictions to relocate these ranges onto the nodes in the correct locality.
ALTER DATABASE django PLACEMENT RESTRICTED;
Now that you have restricted the placement of non-voting replicas for all regional tables, you can run another replication report to see the effects. Be patient as this can take a couple of minutes to have an effect (the more ranges it needs to move the longer it will take).
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
The benefits of multi-region application architecture
Being relatively new to Python and Django I found it straightforward to edit an existing application to demonstrate the multi-region capabilities of CockroachDB. This demonstrated to me how easy it is to develop Python applications with the help of the Django framework.
Data Domiciling (or pinning data to specific localities in layman’s terms) with CockroachDB is helpful for improving the performance of reads and writes. Pinning ranges to specific locations reduces the round trip time for consensus decisions to be made, which reduces write latencies. An additional benefit of this capability is that by controlling the locality of your data you can conform to data sovereignty or ownership legislation. So, if you are looking to create multi-region Python applications backed by a relational database, Django and CockroachDB are a good combination.
Don’t forget all the code I used in the blog is available here.