Take and Restore Locality-aware Backups

On this page Carat arrow pointing down

Locality-aware backups allow you to partition and store backup data in a way that is optimized for locality. When you run a locality-aware backup, nodes write backup data to the cloud storage bucket that is closest to the node locality configured at node startup.

Warning:

While a locality-aware backup will always match the node locality and storage bucket locality, a range's locality will not necessarily match the node's locality. The backup job will attempt to back up ranges through nodes matching that range's locality, however this is not always possible. As a result, Cockroach Labs cannot guarantee that all ranges will be backed up to a cloud storage bucket with the same locality. You should consider this as you plan a backup strategy that must comply with data domiciling requirements.

A locality-aware backup is specified by a list of URIs, each of which has a COCKROACH_LOCALITY URL parameter whose single value is either default or a single locality key-value pair such as region=us-east. At least one COCKROACH_LOCALITY must be the default. Restore jobs can read from a locality-aware backup when you provide the list of URIs that together contain the locations of all of the files for a single locality-aware backup.

A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the coordinator node.

Technical overview

For a technical overview of how a locality-aware backup works, refer to Job coordination and export of locality-aware backups.

Supported products

Locality-aware backups are available in CockroachDB Advanced, CockroachDB Standard, CockroachDB Basic, and **CockroachDB self-hosted clusters when you are running self-managed backups. For a full list of features on CockroachDB Cloud, refer to Backup and Restore Overview.

Note:

Both CockroachDB Standard and CockroachDB Basic clusters operate with a different architecture compared to CockroachDB self-hosted. These architectural differences have implications for how locality-aware backups can run. Standard and Basic clusters will scale resources depending on whether they are actively in use. This makes it less likely to have a SQL pod available in every locality. As a result, your Serverless cluster may not have a SQL pod in the locality where the data resides, which can lead to the cluster uploading that data to a storage bucket in a locality where you do have active SQL pods. You should consider this as you plan a backup strategy that must comply with data domiciling requirements.

CockroachDB also supports locality-restricted backup execution, which allows you to specify a set of locality filters for a backup job to restrict the nodes that can participate in the backup process to that locality. This allows only nodes to execute a backup that meet certain requirements, such as being located in a specific region or having access to a certain storage bucket. Refer to Take Locality-restricted Backups for more detail.

Create a locality-aware backup

For example, to create a locality-aware backup where nodes with the locality region=us-west write backup files to s3://us-west-bucket, and all other nodes write to s3://us-east-bucket by default, run:

icon/buttons/copy
BACKUP INTO
      ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west');

When you run the BACKUP statement for a locality-aware backup, check the following:

  • The locality query string parameters must be URL-encoded.
  • If you are creating an external connection with BACKUP query parameters or authentication parameters, you must pass them in uppercase otherwise you will receive an unknown query parameters error.
  • A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the coordinator node.

You can restore the backup by running:

icon/buttons/copy
RESTORE FROM LATEST IN ('s3://us-east-bucket', 's3://us-west-bucket');

Note that the first URI in the list has to be the URI specified as the default URI when the backup was created. If you have moved your backups to a different location since the backup was originally taken, the first URI must be the new location of the files originally written to the default location.

To restore from a specific backup, use RESTORE FROM {subdirectory} IN ...

For guidance on how to identify the locality of a node to pass in a backup query, see Show a node's locality.

Note:

For guidance on connecting to other storage options or using other authentication parameters, read Use Cloud Storage.

Show a node's locality

To determine the locality that a node was started with, run SHOW LOCALITY:

icon/buttons/copy
SHOW LOCALITY;
    locality
+---------------------+
region=us-east,az=az1
(1 row)

The output shows the locality to which the node will write backup data. One of the single locality key-value pairs can be passed to BACKUP with the COCKROACH_LOCALITY parameter (e.g., 's3://us-east-bucket?COCKROACH_LOCALITY=region%3Dus-east').

Note:

Specifying both locality tier pairs (e.g., region=us-east,az=az1) from the output will cause the backup query to fail with: tier must be in the form "key=value".

Show locality-aware backups

Note:

SHOW BACKUP is able to display metadata using check_files for locality-aware backups taken with the incremental_location option.

To view a list of locality-aware backups, pass the endpoint collection URI that is set as the default location with COCKROACH_LOCALITY=default:

icon/buttons/copy
> SHOW BACKUPS IN 's3://{default collection URI}/{path}?AWS_ACCESS_KEY_ID={placeholder}&AWS_SECRET_ACCESS_KEY={placeholder}';
        path
-------------------------
/2023/02/23-150925.62
/2023/03/08-192859.44
(2 rows)

To view a locality-aware backup, pass locality-aware backup URIs to SHOW BACKUP:

icon/buttons/copy
> SHOW BACKUP FROM LATEST IN ('s3://{bucket name}/locality?AWS_ACCESS_KEY_ID={placeholder}&AWS_SECRET_ACCESS_KEY={placeholder}&COCKROACH_LOCALITY=default', 's3://{bucket name}/locality?AWS_ACCESS_KEY_ID={placeholder}&AWS_SECRET_ACCESS_KEY={placeholder}&COCKROACH_LOCALITY=region%3Dus-west');
  database_name | parent_schema_name |        object_name         | object_type | backup_type | start_time |          end_time          | size_bytes | rows | is_full_cluster
----------------+--------------------+----------------------------+-------------+-------------+------------+----------------------------+------------+------+------------------
  NULL          | NULL               | movr                       | database    | full        | NULL       | 2023-02-23 15:09:25.625777 |       NULL | NULL |        f
  movr          | NULL               | public                     | schema      | full        | NULL       | 2023-02-23 15:09:25.625777 |       NULL | NULL |        f
  movr          | public             | users                      | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |       5633 |   58 |        f
  movr          | public             | vehicles                   | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |       3617 |   17 |        f
  movr          | public             | rides                      | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |     159269 |  511 |        f
  movr          | public             | vehicle_location_histories | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |      79963 | 1092 |        f
  movr          | public             | promo_codes                | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |     221763 | 1003 |        f
  movr          | public             | user_promo_codes           | table       | full        | NULL       | 2023-02-23 15:09:25.625777 |        927 |   11 |        f
(8 rows)

Restore from a locality-aware backup

Given a list of URIs that together contain the locations of all of the files for a single locality-aware backup, RESTORE can read in that backup. Note that the list of URIs passed to RESTORE may be different from the URIs originally passed to BACKUP. This is because it's possible to move the contents of one of the parts of a locality-aware backup (i.e., the files written to that destination) to a different location, or even to consolidate all the files for a locality-aware backup into a single location.

When restoring a full backup, the cluster data is restored first, then the system table data "as is." This means that the restored zone configurations can point to regions that do not have active nodes in the new cluster. For example, if your full backup has the following zone configurations:

ALTER PARTITION europe_west OF INDEX movr.public.rides@rides_pkey \
        CONFIGURE ZONE USING constraints = '[+region=europe-west1]';

ALTER PARTITION us_east OF INDEX movr.public.rides@rides_pkey \
        CONFIGURE ZONE USING constraints = '[+region=us-east1]';

ALTER PARTITION us_west OF INDEX movr.public.rides@rides_pkey \
        CONFIGURE ZONE USING constraints = '[+region=us-west1]';

And the restored cluster does not have nodes with the locality region=us-west1, the restored cluster will still have a zone configuration for us-west1. This means that the cluster's data will not be reshuffled to us-west1 because the region does not exist. The data will be distributed as if the zone configuration does not exist. For the data to be distributed correctly, you can add node(s) with the missing region or remove the zone configuration.

For example, use the following to create a locality-aware backup:

icon/buttons/copy
BACKUP INTO
      ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west')

Restore a locality-aware backup with:

icon/buttons/copy
RESTORE FROM LATEST IN ('s3://us-east-bucket/', 's3://us-west-bucket/');

To restore from a specific backup, use RESTORE FROM {subdirectory} IN ....

Note:

RESTORE is not truly locality-aware; while restoring from backups, a node may read from a store that does not match its locality. This can happen in the cases that either the BACKUP or RESTORE was not of a full cluster. Note that during a locality-aware restore, some data may be temporarily located on another node before it is eventually relocated to the appropriate node. To avoid this, you can manually restore zone configurations from a locality-aware backup.

Create an incremental locality-aware backup

If you backup to a destination already containing a full backup, an incremental backup will be appended to the full backup in a subdirectory. When you're taking an incremental backup, you must ensure that the incremental backup localities match the full backup localities otherwise you will receive an error. Alternatively, take another full backup with the matching localities before running the incremental backup.

There is different syntax for taking an incremental backup depending on where you need to store the backups:

  • To append your incremental backup to the full backup in the incrementals directory:

    icon/buttons/copy
    BACKUP INTO LATEST IN
        ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west');
    
    Note:

    When restoring from an incremental locality-aware backup, you need to include every locality ever used, even if it was only used once. At least one COCKROACH_LOCALITY must be the default.

  • To explicitly control the subdirectory for your incremental backup:

    icon/buttons/copy
    BACKUP INTO {subdirectory} IN
            ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west');
    

    To view the available subdirectories, use SHOW BACKUPS.

  • To append your incremental backup to the full backup using the incremental_location option to send your incremental backups to a different location, you must include the same number of locality-aware URIs for the full backup destination and the incremental_location option:

    icon/buttons/copy
    BACKUP INTO LATEST IN
        ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west') WITH incremental_location = ('s3://us-east-bucket-2?COCKROACH_LOCALITY=default', 's3://us-west-bucket-2?COCKROACH_LOCALITY=region%3Dus-west');
    

    For more detail on using the incremental_location option, see Incremental backups with explicitly specified destinations.

Restore from an incremental locality-aware backup

A locality-aware backup URI can also be used in place of any incremental backup URI in RESTORE.

For example, an incremental locality-aware backup created with

icon/buttons/copy
BACKUP INTO LATEST IN
      ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west')

can be restored by running:

icon/buttons/copy
RESTORE FROM LATEST IN ('s3://us-east-bucket/', 's3://us-west-bucket/');

To restore from a specific backup, use RESTORE FROM {subdirectory} IN ....

Note:

When restoring from an incremental locality-aware backup, you need to include every locality ever used, even if it was only used once.

Manually restore zone configurations from a locality-aware backup

During a locality-aware restore, some data may be temporarily located on another node before it is eventually relocated to the appropriate node. To avoid this, you need to manually restore zone configurations first:

Once the locality-aware restore has started, pause the restore:

icon/buttons/copy
PAUSE JOB 27536791415282;

The system.zones table stores your cluster's zone configurations, which will prevent the data from rebalancing. To restore them, you must restore the system.zones table into a new database because you cannot drop the existing system.zones table:

icon/buttons/copy
RESTORE TABLE system.zones FROM '2021/03/23-213101.37' IN
    'azure-blob://acme-co-backup?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
    WITH into_db = 'newdb';

After it's restored into a new database, you can write the restored zones table data to the cluster's existing system.zones table:

icon/buttons/copy
INSERT INTO system.zones SELECT * FROM newdb.zones;

Then drop the temporary table you created:

icon/buttons/copy
DROP TABLE newdb.zones;

Then, resume the restore:

icon/buttons/copy
RESUME JOB 27536791415282;

See also


Yes No
On this page

Yes No