Migrate to CockroachDB in Phases

On this page Carat arrow pointing down

A phased migration to CockroachDB uses the MOLT tools to convert your source schema, incrementally load source data and verify the results, and finally replicate ongoing changes before performing cutover.

Before you begin

  • Review the Migration Overview.
  • Install the MOLT (Migrate Off Legacy Technology) tools.
  • Review the MOLT Fetch setup and best practices.
  • When exporting data to cloud storage, ensure that access control is properly configured:

    • If you are using Amazon S3 for cloud storage:

      • Ensure that the following environment variables are set appropriately in the terminal running molt fetch:

        icon/buttons/copy
        export AWS_REGION='us-east-1'
        export AWS_SECRET_ACCESS_KEY='key'
        export AWS_ACCESS_KEY_ID='id'
        
      • Alternatively, set the --use-implicit-auth flag to use implicit authentication.

      • Ensure the S3 bucket is created and accessible by authorized roles and users only.

    • If you are using Google Cloud Storage for cloud storage:

      • Ensure that your local environment is authenticated using Application Default Credentials:

        Using gcloud:

        icon/buttons/copy
        gcloud init
        gcloud auth application-default login
        

        Using the environment variable:

        icon/buttons/copy
        export GOOGLE_APPLICATION_CREDENTIALS={path_to_cred_json}
        
      • Alternatively, set the --use-implicit-auth flag to use implicit authentication.

      • Ensure the Google Cloud Storage bucket is created and accessible by authorized roles and users only.

Select the source dialect you will migrate to CockroachDB:

Step 1. Prepare the source database

Ensure that the PostgreSQL database is configured for replication. Enable logical replication by setting wal_level to logical in postgresql.conf or in the SQL shell. For example:

icon/buttons/copy
ALTER SYSTEM SET wal_level = 'logical';

Ensure that the MySQL database is configured for replication.

For MySQL 8.0 and later sources, enable global transaction identifiers (GTID) consistency. Set the following values in mysql.cnf, in the SQL shell, or as flags in the mysql start command:

  • --enforce-gtid-consistency=ON
  • --gtid-mode=ON
  • --binlog-row-metadata=full

For MySQL 5.7 sources, set the following values. Note that binlog-row-image is used instead of binlog-row-metadata. Set server-id to a unique integer that differs from any other MySQL server you have in your cluster (e.g., 3).

  • --enforce-gtid-consistency=ON
  • --gtid-mode=ON
  • --binlog-row-image=full
  • --server-id={ID}
  • --log-bin=log-bin

Step 2. Prepare the source schema

Note:

CockroachDB supports the PostgreSQL wire protocol and is largely compatible with PostgreSQL syntax. For syntax differences, refer to Features that differ from PostgreSQL.

  1. Convert your database schema to an equivalent CockroachDB schema.

    The simplest method is to use the MOLT Schema Conversion Tool to convert your schema line-by-line. The tool accepts .sql files and will convert the syntax, identify unimplemented features and syntax incompatibilities in the schema, and suggest edits according to CockroachDB best practices.

    The Schema Conversion Tool requires a free CockroachDB Cloud account. If this is not an option for you, do one of the following:

    For additional help, contact your account team.

  2. Import the converted schema to a CockroachDB cluster.

When using the Schema Conversion Tool, syntax that cannot automatically be converted will be displayed in the Summary Report. These may include the following:

String case sensitivity

Strings are case-insensitive in MySQL and case-sensitive in CockroachDB. You may need to edit your MySQL data to get the results you expect from CockroachDB. For example, you may have been doing string comparisons in MySQL that will need to be changed to work with CockroachDB.

For more information about the case sensitivity of strings in MySQL, refer to Case Sensitivity in String Searches from the MySQL documentation. For more information about CockroachDB strings, refer to STRING.

Identifier case sensitivity

Identifiers are case-sensitive in MySQL and case-insensitive in CockroachDB. When using the Schema Conversion Tool, you can either keep case sensitivity by enclosing identifiers in double quotes, or make identifiers case-insensitive by converting them to lowercase.

AUTO_INCREMENT attribute

The MySQL AUTO_INCREMENT attribute, which creates sequential column values, is not supported in CockroachDB. When using the Schema Conversion Tool, columns with AUTO_INCREMENT can be converted to use sequences, UUID values with gen_random_uuid(), or unique INT8 values using unique_rowid(). Cockroach Labs does not recommend using a sequence to define a primary key column. For more information, refer to Unique ID best practices.

Note:

Changing a column type during schema conversion will cause MOLT Verify to identify a type mismatch during data validation. This is expected behavior.

ENUM type

MySQL ENUM types are defined in table columns. On CockroachDB, ENUM is a standalone type. When using the Schema Conversion Tool, you can either deduplicate the ENUM definitions or create a separate type for each column.

TINYINT type

TINYINT data types are not supported in CockroachDB. The Schema Conversion Tool automatically converts TINYINT columns to INT2 (SMALLINT).

Geospatial types

MySQL geometry types are not converted to CockroachDB geospatial types by the Schema Conversion Tool. They should be manually converted to the corresponding types in CockroachDB.

FIELD function

The MYSQL FIELD function is not supported in CockroachDB. Instead, you can use the array_position function, which returns the index of the first occurrence of element in the array.

Example usage:

icon/buttons/copy
SELECT array_position(ARRAY[4,1,3,2],1);
  array_position
------------------
               2
(1 row)

While MySQL returns 0 when the element is not found, CockroachDB returns NULL. So if you are using the ORDER BY clause in a statement with the array_position function, the caveat is that sort is applied even when the element is not found. As a workaround, you can use the COALESCE operator.

icon/buttons/copy
SELECT * FROM table_a ORDER BY COALESCE(array_position(ARRAY[4,1,3,2],5),999);

Step 3. Load data into CockroachDB

Tip:

To optimize performance of data load, Cockroach Labs recommends dropping any constraints and indexes on the target CockroachDB database. You can recreate them after the data is loaded.

Perform an initial load of data into the target database. This can be a subset of the source data that you wish to verify, or it can be the entire dataset.

The following example migrates a single employees table. The table is exported to an Amazon S3 bucket and imported to CockroachDB using the IMPORT INTO statement, which is the default MOLT Fetch mode.

  • IMPORT INTO takes the target CockroachDB tables offline to maximize throughput. The tables come back online once the import job completes successfully. If you need to keep the target tables online, add the --use-copy flag to export data with COPY FROM instead. For more details, refer to Data movement.

  • If you cannot move data to a public cloud, specify --direct-copy instead of --bucket-path in the molt fetch command. This flag instructs MOLT Fetch to use COPY FROM to move the source data directly to CockroachDB without an intermediate store. For more information, refer to Direct copy.

  1. Issue the MOLT Fetch command to move the source data to CockroachDB, specifying --mode data-load to perform a one-time data load. For details on this mode, refer to the MOLT Fetch page.

    Note:

    Ensure that the --source and --target connection strings are URL-encoded.

    icon/buttons/copy

    Be sure to specify --pglogical-replication-slot-name, which is required for replication in Step 6.

    molt fetch \
    --source 'postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees' \
    --bucket-path 's3://molt-test' \
    --table-handling truncate-if-exists \
    --non-interactive \
    --pglogical-replication-slot-name cdc_slot \
    --mode data-load
    
    icon/buttons/copy
    molt fetch \
    --source 'mysql://user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees' \
    --bucket-path 's3://molt-test' \
    --table-handling truncate-if-exists \
    --non-interactive \
    --mode data-load
    
  2. Check the output to observe fetch progress.

    A starting fetch message indicates that the task has started:

    {"level":"info","type":"summary","num_tables":1,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
    
    {"level":"info","type":"summary","num_tables":1,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-28","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
    

    data extraction messages are written for each table that is exported to the location in --bucket-path:

    {"level":"info","table":"public.employees","time":"2025-02-10T14:28:11-05:00","message":"data extraction phase starting"}
    
    {"level":"info","table":"public.employees","type":"summary","num_rows":200000,"export_duration_ms":1000,"export_duration":"000h 00m 01s","time":"2025-02-10T14:28:12-05:00","message":"data extraction from source complete"}
    

    data import messages are written for each table that is loaded into CockroachDB:

    {"level":"info","table":"public.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"}
    
    {"level":"info","table":"public.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
    
    {"level":"info","table":"public.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
    

    A fetch complete message is written when the fetch task succeeds:

    {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"0/3F41E40","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
    
    {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
    

Step 4. Verify the data load

  1. Use MOLT Verify to validate the consistency of the data between the source database and CockroachDB.

    icon/buttons/copy
    molt verify \
    --source 'postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees' \
    
    icon/buttons/copy
    molt verify \
    --source 'mysql://user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees'
    
  2. Check the output to observe verify progress.

    A verification in progress indicates that the task has started:

    {"level":"info","time":"2025-02-10T15:35:04-05:00","message":"verification in progress"}
    

    starting verify messages are written for each specified table:

    {"level":"info","time":"2025-02-10T15:35:04-05:00","message":"starting verify on public.employees, shard 1/1"}
    

    A finished row verification message containing a summary is written after each table is compared. For details on the summary fields, refer to the MOLT Verify page.

    {"level":"info","type":"summary","table_schema":"public","table_name":"employees","num_truth_rows":200004,"num_success":200004,"num_conditional_success":0,"num_missing":0,"num_mismatch":0,"num_extraneous":0,"num_live_retry":0,"num_column_mismatch":0,"time":"2025-02-10T15:35:05-05:00","message":"finished row verification on public.employees (shard 1/1)"}
    

    A verification complete message is written when the verify task succeeds:

    {"level":"info","net_duration_ms":699.804875,"net_duration":"000h 00m 00s","time":"2025-02-10T15:35:05-05:00","message":"verification complete"}
    

Repeat Step 3 and Step 4 to migrate any remaining tables.

Step 5. Modify the CockroachDB schema

Note:

If you need the best possible replication performance, you can perform this step right before cutover.

You can now add any constraints or indexes that you previously removed from the CockroachDB schema.

For the appropriate SQL syntax, refer to ALTER TABLE ... ADD CONSTRAINT and CREATE INDEX. Review the best practices for creating secondary indexes on CockroachDB.

Step 6. Replicate changes to CockroachDB

With initial load complete, start replication of ongoing changes on the source to CockroachDB.

The following example specifies that the employees table should be watched for change events.

  1. Issue the MOLT Fetch command to start replication on CockroachDB, specifying --mode replication-only to replicate ongoing changes on the source to CockroachDB. For details on this mode, refer to the MOLT Fetch page.

    Be sure to specify the same --pglogical-replication-slot-name value that you provided in Step 3.

    icon/buttons/copy
    molt fetch \
    --source 'postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees' \
    --non-interactive \
    --mode replication-only \
    --pglogical-replication-slot-name cdc_slot
    

    Use the --defaultGTIDSet replication flag to specify the GTID set. To find your GTID record, run SELECT source_uuid, min(interval_start), max(interval_end) FROM mysql.gtid_executed GROUP BY source_uuid; on MySQL.

    icon/buttons/copy
    molt fetch \
    --source 'mysql://user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \
    --target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
    --table-filter 'employees' \
    --non-interactive \
    --mode replication-only \
    --replicator-flags '--defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29'
    
  2. Check the output to observe replicator progress.

    A starting replicator message indicates that the task has started:

    {"level":"info","time":"2025-02-10T14:28:13-05:00","message":"starting replicator"}
    

    The staging database name message contains the name of the staging schema:

    {"level":"info","time":"2025-02-10T14:28:13-05:00","message":"staging database name: _replicator_1739215693817700000"}
    

    The staging schema provides a replication marker for streaming changes. You will need the staging schema name in case replication fails and must be resumed, or failback to the source database is performed.

    upserted rows log messages indicate that changes were replicated to CockroachDB:

    DEBUG  [Jan 22 13:52:40] upserted rows                                 conflicts=0 duration=7.620208ms proposed=1 target="\"molt\".\"public\".\"employees\"" upserted=1
    

Step 7. Stop replication and verify data

  1. Stop application traffic to your source database. This begins downtime.

  2. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated to CockroachDB. When replication has fully drained, you will not see new upserted rows logs.

  3. Cancel replication to CockroachDB by entering ctrl-c to issue a SIGTERM signal. This returns an exit code 0.

  4. Repeat Step 4 to verify the updated data.

Tip:

If you encountered issues with replication, you can now use failback mode to replicate changes on CockroachDB back to the initial source database. In case you need to roll back the migration, this ensures that data is consistent on the initial source database.

Step 8. Cutover

Perform a cutover by resuming application traffic, now to CockroachDB.

See also


Yes No
On this page

Yes No