MOLT Fetch

On this page

MOLT Fetch moves data from a source database into CockroachDB as part of a database migration.

MOLT Fetch uses IMPORT INTO or COPY FROM to move the source data to cloud storage (Google Cloud Storage or Amazon S3), a local file server, or local memory. Once the data is exported, MOLT Fetch can load the data into a target CockroachDB database and replicate changes from the source database. For details, see Usage.

Supported databases

The following source databases are currently supported:

PostgreSQL 11-14
MySQL 5.7, 8.0 and later
CockroachDB

Installation

To install MOLT Fetch, download the binary that matches your system. To download the latest binary:

Operating System	AMD 64-bit	ARM 64-bit
Windows	Download	Download
Linux	Download	Download
Mac	Download	Download

The following binaries are included:

molt
replicator. To use replication features, replicator must be located either in the same directory as molt or in a directory beneath molt.

For previous binaries, refer to the MOLT version manifest. For release details, refer to the MOLT changelog.

Note:

MOLT Fetch is supported on Red Hat Enterprise Linux (RHEL) 9 and above.

Docker image

Docker multi-platform images containing both the AMD and ARM binaries are available. To pull the latest image:

docker pull cockroachdb/molt

To pull a specific version (e.g., 1.1.3):

docker pull cockroachdb/molt:1.1.3

For details on running in Docker, see Docker usage.

Setup

Complete the following items before using MOLT Fetch:

Follow the recommendations in Best practices and Security recommendations.
Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the drop-on-target-and-recreate option. If you are creating the target schema manually, review the behaviors in Mismatch handling.
Ensure that the SQL user running MOLT Fetch has SELECT privileges on the source and target CockroachDB databases, along with the required privileges to run IMPORT INTO or COPY FROM (depending on the command used for data movement) on CockroachDB, as described on their respective pages.
If you plan to use continuous replication, using either the MOLT Fetch replication feature or an external change data capture (CDC) tool:
- For PostgreSQL sources, enable logical replication. In postgresql.conf or in the SQL shell, set wal_level to logical.
- For MySQL 8.0 and later sources, enable GTID consistency. Set the following values in mysql.cnf, in the SQL shell, or as flags in the mysql start command:
  - --binlog-row-metadata=full
  - --enforce-gtid-consistency=ON
  - --gtid-mode=ON
- For MySQL 5.7 sources, set the following values. Note that binlog-row-image is used instead of binlog-row-metadata. Set server-id to a unique integer that differs from any other MySQL server you have in your cluster (e.g., 3).
  - --binlog-row-image=full
  - --enforce-gtid-consistency=ON
  - --gtid-mode=ON
  - --server-id={ID}
  - --log-bin=log-bin
URL-encode the connection strings for the source database and CockroachDB. This ensures that the MOLT tools can parse special characters in your password.
- Given a password a$52&, pass it to the molt escape-password command with single quotes:
```
molt escape-password 'a$52&'
```
  Substitute the following encoded password in your original connection url string:
```
a%2452%26
```
- Use the encoded password in your connection string. For example:
```
postgres://postgres:a%2452%26@localhost:5432/replicationload
```
If you plan to use cloud storage for the data migration, follow the steps in Secure cloud storage.

Best practices

To prevent connections from terminating prematurely during data export, set the following to high values on the source database:
- Maximum allowed number of connections. MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards (--export-concurrency) multiplied by the number of tables (--table-concurrency) being exported concurrently.
- Maximum lifetime of a connection.
If a PostgreSQL database is set as a source, ensure that idle_in_transaction_session_timeout on PostgreSQL is either disabled or set to a value longer than the duration of data export. Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of molt_fetch_table_export_duration_ms for all exported tables.
To prevent memory outages during READ COMMITTED data export of tables with large rows, estimate the amount of memory used to export a table:
```
--row-batch-size * --export-concurrency * average size of the table rows
```
If you are exporting more than one table at a time (i.e., --table-concurrency is set higher than 1), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run molt fetch, and adjust --row-batch-size accordingly.
If a table in the source database is much larger than the other tables, filter and export the largest table in its own molt fetch task. Repeat this for each of the largest tables. Then export the remaining tables in another task.
When using IMPORT INTO to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, manually resume or cancel the job. Then resume molt fetch using continuation, or restart the task from the beginning.
Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the molt_fetch_rows_exported metric for decreases in the number of rows being processed. You can use the sample Grafana dashboard to view metrics.
Before moving data, Cockroach Labs recommends dropping any constraints and indexes on the target CockroachDB database. Doing so will optimize performance. You can recreate constraints and indexes after the data is loaded.

Security recommendations

Cockroach Labs strongly recommends the following:

Secure connections

Use secure connections to the source and target CockroachDB database whenever possible.
When performing failback, use a secure changefeed connection by overriding the default configuration.
By default, insecure connections (i.e., sslmode=disable on PostgreSQL; sslmode not set on MySQL) are disallowed. When using an insecure connection, molt fetch returns an error. To override this check, you can enable the --allow-tls-mode-disable flag. Do this only when testing, or if a secure SSL/TLS connection to the source or target database is not possible.

Connection strings

Avoid plaintext connection strings.
Provide your connection strings as environment variables.

If possible within your security infrastructure, use an external secrets manager to load the environment variables from stored secrets.

For example, to export connection strings as environment variables:

export SOURCE="postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full"
export TARGET="postgres://root@localhost:26257/molt?sslmode=verify-full"

Afterward, to pass the environment variables in molt fetch commands:

molt fetch \
--source $SOURCE \
--target $TARGET \
--table-filter 'employees' \
--bucket-path 's3://molt-test' \
--table-handling truncate-if-exists

Secure cloud storage

When exporting data to cloud storage, ensure that access control is properly configured:
- If you are using Amazon S3 for cloud storage:
  - Ensure that the following environment variables are set appropriately in the terminal running molt fetch:
```
export AWS_REGION='us-east-1'
export AWS_SECRET_ACCESS_KEY='key'
export AWS_ACCESS_KEY_ID='id'
```
  - Alternatively, set the --use-implicit-auth flag to use implicit authentication.
  - Ensure the S3 bucket is created and accessible by authorized roles and users only.
- If you are using Google Cloud Storage for cloud storage:
  - Ensure that your local environment is authenticated using Application Default Credentials:
    
    Using gcloud:
```
gcloud init
gcloud auth application-default login
```
    Using the environment variable:
```
export GOOGLE_APPLICATION_CREDENTIALS={path_to_cred_json}
```
  - Alternatively, set the --use-implicit-auth flag to use implicit authentication.
  - Ensure the Google Cloud Storage bucket is created and accessible by authorized roles and users only.

Perform a dry run

To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before moving any data in production. Use a test or development environment that is as similar as possible to production.

Commands

Command	Usage
`fetch`	Start the fetch task. This loads data from a source database to a target CockroachDB database.

Subcommands

Command	Usage
`tokens list`	List active continuation tokens.

Flags

Global flags

Flag	Description
`--source`	(Required) Connection string for the source database. For details, see Source and target databases.
`--target`	(Required) Connection string for the target database. For details, see Source and target databases.
`--allow-tls-mode-disable`	Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled only if secure SSL/TLS connections to the source or target database are not possible.
`--assume-role`	Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to Cloud Storage Authentication.
`--bucket-path`	The path within the cloud storage bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`.
`--changefeeds-path`	Path to a JSON file that contains changefeed override settings for failback, when enabled with `--mode failback`. If not specified, an insecure default configuration is used, and `--allow-tls-mode-disable` must be included. For details, see Fail back to source database.
`--cleanup`	Whether to delete intermediate files after moving data using cloud or local storage. Note: Cleanup does not occur on continuation.
`--compression`	Compression method for data when using `IMPORT INTO` (`gzip`/`none`). Default: `gzip`
`--continuation-file-name`	Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see Fetch continuation.
`--continuation-token`	Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see Fetch continuation.
`--crdb-pts-duration`	The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`. Default: `24h0m0s`
`--crdb-pts-refresh-interval`	The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete. Default: `10m0s`
`--direct-copy`	Enables direct copy, which copies data directly from source to target without using an intermediate store.
`--export-concurrency`	Number of shards to export at a time, each on a dedicated thread. This only applies when exporting data from the source database, not when loading data into the target database. Only tables with primary key types of `INT`, `FLOAT`, or `UUID` can be sharded. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Default: `4`
`--fetch-id`	Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables.
`--flush-rows`	Number of rows before the source data is flushed to intermediate files. Note: If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first.
`--flush-size`	Size (in bytes) before the source data is flushed to intermediate files. Note: If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first.
`--import-batch-size`	The number of files to be imported at a time to the target database. This applies only when using `IMPORT INTO` for data movement. Note: Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must retry the entire batch. Default: `1000`
`--import-region`	The region of the cloud storage bucket. This applies only to Amazon S3 buckets. Set this flag only if you need to specify an `AWS_REGION` explicitly when using `IMPORT INTO` for data movement. For example, `--import-region=ap-south-1`.
`--local-path`	The path within the local file server where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified.
`--local-path-crdb-access-addr`	Address of a local file server that is publicly accessible. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB Cloud deployment). `--local-path` and `--local-path-listen-addr` must be specified. Default: Value of `--local-path-listen-addr`.
`--local-path-listen-addr`	Write intermediate files to a local file server at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified.
`--log-file`	Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`.
`--logging`	Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`). Default: `info`
`--metrics-listen-addr`	Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, see Metrics. Default: `'127.0.0.1:3030'`
`--mode`	Configure the MOLT Fetch behavior: `data-load`, `data-load-and-replication`, `replication-only`, `export-only`, or `import-only`. For details, refer to Fetch mode. Default: `data-load`
`--non-interactive`	Run the fetch task without interactive prompts. This is recommended only when running `molt fetch` in an automated process (i.e., a job or continuous integration).
`--pglogical-publication-name`	If set, the name of the publication that will be created or used for replication. Used in `replication-only` mode. Default: `molt_fetch`
`--pglogical-publication-and-slot-drop-and-recreate`	If set, drops the publication and slots if they exist and then recreates them. Used in `replication-only` mode.
`--pglogical-replication-slot-name`	The name of a replication slot to create before taking a snapshot of data (e.g., `'fetch'`). Required in order to perform continuous replication from a source PostgreSQL database.
`--pglogical-replication-slot-plugin`	The output plugin used for logical replication under `--pglogical-replication-slot-name`. Default: `pgoutput`
`--pprof-listen-addr`	Address of the pprof endpoint. Default: `'127.0.0.1:3031'`
`--replicator-flags`	If continuous replication is enabled with `--mode data-load-and-replication`, `--mode replication-only`, or `--mode failback`, specify replication flags to override. For example: `--replicator-flags "--tlsCertificate ./certs/server.crt --tlsPrivateKey ./certs/server.key"`
`--row-batch-size`	Number of rows per shard to export at a time. See Best practices. Default: `100000`
`--schema-filter`	Move schemas that match a specified regular expression. Default: `'.*'`
`--table-concurrency`	Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Default: `4`
`--table-exclusion-filter`	Exclude tables that match a specified POSIX regular expression. This value cannot be set to `'.'`, which would cause every table to be excluded. Default:* Empty string
`--table-filter`	Move tables that match a specified POSIX regular expression. Default: `'.*'`
`--table-handling`	How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see Target table handling. Default: `none`
`--transformations-file`	Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to Transformations.
`--type-map-file`	Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see type mapping.
`--use-console-writer`	Use the console writer, which has cleaner log output but introduces more latency. Default: `false` (log as structured JSON)
`--use-copy`	Use `COPY FROM` to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to Data movement.
`--use-implicit-auth`	Use implicit authentication for cloud storage URIs.

`tokens list` flags

Flag	Description
`--conn-string`	(Required) Connection string for the target database. For details, see List active continuation tokens.
`-n`, `--num-results`	Number of results to return.

Replication flags

The following flags are set with --replicator-flags and can be used in any Fetch mode that involves replication.

Flag	Type	Description
`--applyTimeout`	`DURATION`	The maximum amount of time to wait for an update to be applied. Default: `30s`
`--dlqTableName`	`IDENT`	The name of a table in the target schema for storing dead-letter entries. Default: `replicator_dlq`
`--flushPeriod`	`DURATION`	Flush queued mutations after this duration. Default: `1s`
`--flushSize`	`INT`	Ideal batch size to determine when to flush mutations. Default: `1000`
`--gracePeriod`	`DURATION`	Allow background processes to exit. Default: `30s`
`--logDestination`	`STRING`	Write logs to a file. If not specified, write logs to `stdout`.
`--logFormat`	`STRING`	Choose log output format: `"fluent"`, `"text"`. Default: `"text"`
`--metricsAddr`	`STRING`	A `host:port` on which to serve metrics and diagnostics.
`--parallelism`	`INT`	The number of concurrent database transactions to use. Default: `16`
`--quiescentPeriod`	`DURATION`	How often to retry deferred mutations. Default: `10s`
`--retireOffset`	`DURATION`	How long to delay removal of applied mutations. Default: `24h0m0s`
`--scanSize`	`INT`	The number of rows to retrieve from the staging database used to store metadata for replication modes. Default: `10000`
`--schemaRefresh`	`DURATION`	How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled. Default: `1m0s`
`--sourceConn`	`STRING`	The source database's connection string.
`--stageDisableCreateTableReaderIndex`	`BOOL`	Disable the creation of partial covering indexes to improve read performance on staging tables. Set to `true` if creating indexes on existing tables would cause a significant operational impact. Default: `false`
`--stageMarkAppliedLimit`	`INT`	Limit the number of mutations to be marked applied in a single statement. Default: `100000`
`--stageSanityCheckPeriod`	`DURATION`	How often to validate staging table apply order (`-1` to disable). Default: `10m0s`
`--stageSanityCheckWindow`	`DURATION`	How far back to look when validating staging table apply order. Default: `1h0m0s`
`--stageUnappliedPeriod`	`DURATION`	How often to report the number of unapplied mutations in staging tables (`-1` to disable). Default: `1m0s`
`--stagingConn`	`STRING`	The staging database's connection string.
`--stagingCreateSchema`		Automatically create the staging schema if it does not exist.
`--stagingIdleTime`	`DURATION`	Maximum lifetime of an idle connection. Default: `1m0s`
`--stagingJitterTime`	`DURATION`	The time over which to jitter database pool disconnections. Default: `15s`
`--stagingMaxLifetime`	`DURATION`	The maximum lifetime of a database connection. Default: `5m0s`
`--stagingMaxPoolSize`	`INT`	The maximum number of staging database connections. Default: `128`
`--stagingSchema`	`STRING`	Name of the SQL database schema that stores replication metadata. Required each time `--mode replication-only` is rerun after being interrupted, as the schema provides a replication marker for streaming changes. For details, refer to Replicate changes. Default: `_replicator.public`
`--targetConn`	`STRING`	The target database's connection string.
`--targetIdleTime`	`DURATION`	Maximum lifetime of an idle connection. Default: `1m0s`
`--targetJitterTime`	`DURATION`	The time over which to jitter database pool disconnections. Default: `15s`
`--targetMaxLifetime`	`DURATION`	The maximum lifetime of a database connection. Default: `5m0s`
`--targetMaxPoolSize`	`INT`	The maximum number of target database connections. Default: `128`
`--targetSchema`	`STRING`	The SQL database schema in the target cluster to update.
`--targetStatementCacheSize`	`INT`	The maximum number of prepared statements to retain. Default: `128`
`--taskGracePeriod`	`DURATION`	How long to allow for task cleanup when recovering from errors. Default: `1m0s`
`--timestampLimit`	`INT`	The maximum number of source timestamps to coalesce into a target transaction. Default: `1000`
`--userscript`	`STRING`	The path to a configuration script, see `userscript` subcommand.
`-v`, `--verbose`	`COUNT`	Increase logging verbosity to `debug`; repeat for `trace`.

PostgreSQL replication flags

The following flags are set with --replicator-flags and can be used in any Fetch mode that involves replication from a PostgreSQL source database.

Flag	Type	Description
`--publicationName`	`STRING`	The publication within the source database to replicate.
`--slotName`	`STRING`	The replication slot in the source database. Default: `"replicator"`
`--standbyTimeout`	`DURATION`	How often to report WAL progress to the source server. Default: `5s`

MySQL replication flags

The following flags are set with --replicator-flags and can be used in any Fetch mode that involves replication from a MySQL source database.

Flag	Type	Description
`--defaultGTIDSet`	`STRING`	Default GTID set, in the format `source_uuid:min(interval_start)-max(interval_end)`. Required the first time `--mode replication-only` is run, as the GTID set provides a replication marker for streaming changes. For details, refer to Replicate changes.
`--fetchMetadata`		Fetch column metadata explicitly, for older versions of MySQL that don't support `binlog_row_metadata`.
`--replicationProcessID`	`UINT32`	The replication process ID to report to the source database. Default: `10`

Failback replication flags

The following flags are set with --replicator-flags and can be used in failback mode.

Flag	Type	Description
`--assumeIdempotent`		Disable the extra staging table queries that debounce non-idempotent redelivery in changefeeds.
`--bestEffortOnly`		Eventually-consistent mode; useful for high-throughput, skew-tolerant schemas with foreign keys.
`--bestEffortWindow`	`DURATION`	Use an eventually-consistent mode for initial backfill or when replication is behind; `0` to disable. Default: `1h0m0s`
`--bindAddr`	`STRING`	The network address to bind to. Default: `":26258"`
`--disableAuthentication`		Disable authentication of incoming Replicator requests; not recommended for production.
`--disableCheckpointStream`		Disable cross-Replicator checkpoint notifications and rely only on polling.
`--discard`		Dangerous: Discard all incoming HTTP requests; useful for changefeed throughput testing. Not intended for production.
`--discardDelay`	`DURATION`	Adds additional delay in discard mode; useful for gauging the impact of changefeed round-trip time (RTT).
`--healthCheckTimeout`	`DURATION`	The timeout for the health check endpoint. Default: `5s`
`--httpResponseTimeout`	`DURATION`	The maximum amount of time to allow an HTTP handler to execute. Default: `2m0s`
`--immediate`		Bypass staging tables and write directly to target; recommended only for KV-style workloads with no foreign keys.
`--limitLookahead`	`INT`	Limit number of checkpoints to be considered when computing the resolving range; may cause replication to stall completely if older mutations cannot be applied.
`--ndjsonBufferSize`	`INT`	The maximum amount of data to buffer while reading a single line of `ndjson` input; increase when source cluster has large blob values. Default: `65536`
`--tlsCertificate`	`STRING`	A path to a PEM-encoded TLS certificate chain.
`--tlsPrivateKey`	`STRING`	A path to a PEM-encoded TLS private key.
`--tlsSelfSigned`		If true, generate a self-signed TLS certificate valid for `localhost`.

Usage

The following sections describe how to use the molt fetch flags.

Source and target databases

Tip:

Follow the recommendations in Connection strings.

--source specifies the connection string of the source database.

PostgreSQL or CockroachDB:

--source 'postgresql://{username}:{password}@{host}:{port}/{database}'

MySQL:

--source 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}'

--target specifies the CockroachDB connection string:

--target 'postgresql://{username}:{password}@{host}:{port}/{database}

Fetch mode

--mode specifies the MOLT Fetch behavior:

Load data into CockroachDB
Load data and replicate changes to CockroachDB
Replicate changes to CockroachDB
Export the data to storage
Fail back to source database

Load data

data-load (default) instructs MOLT Fetch to load the source data into CockroachDB. It does not replicate any subsequent changes on the source.

--mode data-load

If the source is a PostgreSQL database and you intend to replicate changes afterward, also specify a replication slot name with --pglogical-replication-slot-name. MOLT Fetch will create a replication slot with this name. For example, the following snippet instructs MOLT Fetch to create a slot named replication_slot to use for replication:

--mode data-load
--pglogical-replication-slot-name 'replication_slot'

Tip:

In case you need to rename your publication, also include --pglogical-publication-name to specify the new publication name and --pglogical-publication-and-slot-drop-and-recreate to ensure that the publication and replication slot are created in the correct order. For details on these flags, refer to Global flags.

Load data and replicate changes

Note:

Before using this option, the source PostgreSQL or MySQL database must be configured for continuous replication, as described in Setup. MySQL 5.7 and later are supported.

data-load-and-replication instructs MOLT Fetch to load the source data into CockroachDB, and replicate any subsequent changes on the source.

--mode data-load-and-replication

If the source is a PostgreSQL database, you must also specify a replication slot name with --pglogical-replication-slot-name. MOLT Fetch will create a replication slot with this name. For example, the following snippet instructs MOLT Fetch to create a slot named replication_slot to use for replication:

--mode data-load-and-replication
--pglogical-replication-slot-name 'replication_slot'

Tip:

Continuous replication begins once the initial load is complete, as indicated by a fetch complete message in the output. If replication is interrupted, you can resume replication.

To cancel replication, enter ctrl-c to issue a SIGTERM signal. This returns an exit code 0. If replication fails, a non-zero exit code is returned.

To customize the replication behavior (an advanced use case), use --replicator-flags to specify one or more replication-specific flags.

--mode data-load-and-replication
--replicator-flags "--applyTimeout '1h' --parallelism 64"

Replicate changes

Note:

Before using this option:

The source PostgreSQL or MySQL database must be configured for continuous replication, as described in Setup. MySQL 5.7 and later are supported.
The replicator binary must be located either in the same directory as molt or in a directory beneath molt.

replication-only instructs MOLT Fetch to replicate ongoing changes on the source to CockroachDB, using the specified replication marker. This assumes you have already run --mode data-load to load the source data into CockroachDB.

For a PostgreSQL source, you should have already created a replication slot when loading data. Specify the same replication slot name using --pglogical-replication-slot-name. For example:
```
--mode replication-only 
--pglogical-replication-slot-name 'replication_slot'
```
Tip:

In case you want to run replication-only without already having loaded data (e.g., for testing), also include --pglogical-publication-and-slot-drop-and-recreate to ensure that the publication and replication slot are created in the correct order. For details on this flag, refer to Global flags.
For a MySQL source, first get your GTID record:
```
SELECT source_uuid, min(interval_start), max(interval_end)
  FROM mysql.gtid_executed
  GROUP BY source_uuid;
```
In the molt fetch command, specify a GTID set using the --defaultGTIDSet replication flag and the format source_uuid:min(interval_start)-max(interval_end). For example:
```
--mode replication-only 
--replicator-flags "--defaultGTIDSet b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21"
```

If replication is interrupted, you can resume replication.

Resume replication

replication-only can be used to resume replication if it is interrupted in either data-load-and-replication or replication-only mode.

Specify the staging schema with the --stagingSchema replication flag. MOLT Fetch outputs the schema name as staging database name: {schema_name} after the initial replication run.

--mode replication-only
--replicator-flags "--stagingSchema {schema_name}"

You must include the --stagingSchema replication flag when resuming replication, as the schema provides a replication marker for streaming changes.

To cancel replication, enter ctrl-c to issue a SIGTERM signal. This returns an exit code 0. If replication fails, a non-zero exit code is returned.

Export data to storage

export-only instructs MOLT Fetch to export the source data to the specified cloud storage or local file server. It does not load the data into CockroachDB.

--mode export-only

Import data from storage

import-only instructs MOLT Fetch to load the source data in the specified cloud storage or local file server into the CockroachDB target.

--mode import-only

Fail back to source database

Warning:

Before using failback mode, refer to the technical advisory about a bug that affects changefeeds on CockroachDB v22.2, v23.1.0 to v23.1.21, v23.2.0 to v23.2.5, and testing versions of v24.1 through v24.1.0-rc.1.

If you encounter issues after moving data to CockroachDB, you can use failback mode to replicate changes on CockroachDB back to the initial source database. In case you need to roll back the migration, this ensures that data is consistent on the initial source database.

failback mode creates a CockroachDB changefeed and sets up a webhook sink to pass change events from CockroachDB to the failback target. In production, you should secure the connection by specifying changefeed override settings in a JSON file. These settings override the [default insecure changefeed]#default-insecure-changefeed] values, which are suited for testing only. Include the --changefeeds-path flag to indicate the path to the JSON file.

--mode failback
--changefeeds-path 'changefeed-settings.json'

When running molt fetch --mode failback, --source is the CockroachDB connection string and --target is the connection string of the database you migrated from. --table-filter specifies the tables to watch for change events. For example:

--source 'postgresql://{username}:{password}@{host}:{port}/{database}'
--target 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}'
--table-filter 'employees|payments'

Note:

MySQL 5.7 and later are supported as MySQL failback targets.

Changefeed override settings

You can specify the following CREATE CHANGEFEED parameters in the override JSON. If any parameter is not specified, its default value is used.

The following CREATE CHANGEFEED sink URI parameters:
- host: The hostname or IP address of the webhook sink where change events are sent. The applicable certificates of the failback target (i.e., the source database from which you migrated) must be located on this machine.
- port: The port of the webhook sink.
- sink_query_parameters: A comma-separated list of CREATE CHANGEFEED query parameters. This includes the base64-encoded client certificate (client_cert), key (client_key), and CA (ca_cert) for a secure webhook sink.
The following CREATE CHANGEFEED options:

Note:

If there is already a running CockroachDB changefeed with the same webhook sink URL (excluding query parameters) and watched tables, the existing changefeed is used for failback.

Use a secure changefeed connection whenever possible. The default insecure configuration is not recommended in production. To secure the changefeed connection, define sink_query_parameters in the JSON as follows:

{
    "sink_query_parameters": "client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}"
}

client_cert, client_key, and ca_cert are webhook sink parameters that must be base64- and URL-encoded (for example, use the command base64 -i ./client.crt | jq -R -r '@uri').

In the molt fetch command, also include --replicator-flags to specify the paths to the server certificate and key that correspond to the client certs defined in sink_query_parameters. For example:

--changefeeds-path 'changefeed-secure.json'
--replicator-flags "--tlsCertificate ./certs/server.crt --tlsPrivateKey ./certs/server.key"

For a complete example of using molt fetch in failback mode, see Fail back securely from CockroachDB.

Default insecure changefeed

Warning:

Insecure configurations are not recommended. In production, run failback with a secure changefeed connection. For details, see Changefeed override settings.

When molt fetch --mode failback is run without specifying --changefeeds-path, the following CREATE CHANGEFEED parameters are used for the changefeed:

{
    "host": "localhost",
    "port": 30004,
    "sink_query_parameters": "insecure_tls_skip_verify=true",
    "resolved": "1s",
    "min_checkpoint_frequency": "1s",
    "initial_scan": "no",
    "webhook_sink_config": "{\"Flush\":{\"Bytes\":1048576}}"
}

The default parameters specify a local webhook sink ("localhost") and an insecure sink connection ("insecure_tls_skip_verify=true"), which are suited for testing only. In order to run failback with the default insecure configuration, you must also include the following flags:

--allow-tls-mode-disable
--replicator-flags '--tlsSelfSigned --disableAuthentication'

Note:

Either --changefeeds-path, which overrides the default insecure configuration; or --allow-tls-mode-disable, which enables the use of the default insecure configuration, must be specified in failback mode. Otherwise, molt fetch will error.

Data movement

MOLT Fetch can use either IMPORT INTO or COPY FROM to load data into CockroachDB.

By default, MOLT Fetch uses IMPORT INTO:

IMPORT INTO achieves the highest throughput, but requires taking the CockroachDB tables offline to achieve its import speed. Tables are taken back online once an import job completes successfully. See Best practices.
IMPORT INTO supports compression using the --compression flag, which reduces the amount of storage used.

--use-copy configures MOLT Fetch to use COPY FROM:

COPY FROM enables your tables to remain online and accessible. However, it is slower than using IMPORT INTO.
COPY FROM does not support compression.

Note:

COPY FROM is also used for direct copy.

Data path

MOLT Fetch can move the source data to CockroachDB via cloud storage, a local file server, or directly without an intermediate store.

Cloud storage

Tip:

Only the path specified in --bucket-path is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in Secure cloud storage.

--bucket-path instructs MOLT Fetch to write intermediate files to a path within a Google Cloud Storage or Amazon S3 bucket to which you have the necessary permissions. Use additional flags, shown in the following examples, to specify authentication or region parameters as required for bucket access.

The following example connects to a Google Cloud Storage bucket with implicit authentication and assume role.

--bucket-path 'gs://migration/data/cockroach
--assume-role 'user-test@cluster-ephemeral.iam.gserviceaccount.com'
--use-implicit-auth

The following example connects to an Amazon S3 bucket and explicitly specifies the ap_south-1 region. The --import-region flag enables the use of the AWS_REGION query parameter in the s3 URL. When this flag is set, IMPORT INTO must be used for data movement.

--bucket-path 's3://migration/data/cockroach
--import-region 'ap-south-1'

Local file server

--local-path instructs MOLT Fetch to write intermediate files to a path within a local file server. local-path-listen-addr specifies the address of the local file server. For example:

--local-path /migration/data/cockroach
--local-path-listen-addr 'localhost:3000'

In some cases, CockroachDB will not be able to use the local address specified by --local-path-listen-addr. This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect.

For example, if you are migrating to CockroachDB Cloud, such that the Cloud cluster is in a different physical location than the machine running molt fetch, then CockroachDB cannot reach an address such as localhost:3000. In these situations, use --local-path-crdb-access-addr to specify an address for the local file server that is publicly accessible. For example:

--local-path /migration/data/cockroach
--local-path-listen-addr 'localhost:3000'
--local-path-crdb-access-addr '44.55.66.77:3000'

Tip:

Cloud storage is often preferable to a local file server, which can require considerable disk space.

Direct copy

--direct-copy specifies that MOLT Fetch should use COPY FROM to move the source data directly to CockroachDB without an intermediate store:

Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight:
```
average size of each row * --row-batch-size * --export-concurrency * --table-concurrency
```
Direct copy does not support compression or continuation.
The --use-copy flag is redundant with --direct-copy.

Schema and table selection

By default, MOLT Fetch moves all data from the --source database to CockroachDB. Use the following flags to move a subset of data.

--schema-filter specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's public schema:

--schema-filter 'public'

--table-filter and --table-exclusion-filter specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name:

--table-filter '.*user.*' --table-exclusion-filter '.*temp.*'

Target table handling

--table-handling defines how MOLT Fetch loads data on the CockroachDB tables that match the selection.

To load the data without changing the existing data in the tables, use none:

--table-handling none

To truncate tables before loading the data, use truncate-if-exists:

--table-handling truncate-if-exists

To drop existing tables and create new tables before loading the data, use drop-on-target-and-recreate:

--table-handling drop-on-target-and-recreate

When using the drop-on-target-and-recreate option, MOLT Fetch creates a new CockroachDB table to load the source data if one does not already exist. To guide the automatic schema creation, you can explicitly map source types to CockroachDB types.

Mismatch handling

If either none or truncate-if-exists is set, molt fetch loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, molt fetch will exit early in certain cases, and will need to be re-run from the beginning.

Note:

This does not apply when drop-on-target-and-recreate is specified, since this option automatically creates a compatible CockroachDB schema.

molt fetch exits early in the following cases, and will output a log with a corresponding mismatch_tag and failable_mismatch set to true:

A source table is missing a primary key.
A source and table primary key have mismatching types.
A STRING primary key has a different collation on the source and target.
A source and target column have mismatching types that are not allowable mappings.
A target table is missing a column that is in the corresponding source table.
A source column is nullable, but the corresponding target column is not nullable (i.e., the constraint is more strict on the target).

molt fetch can continue in the following cases, and will output a log with a corresponding mismatch_tag and failable_mismatch set to false:

A target table has a column that is not in the corresponding source table.
A source column has a NOT NULL constraint, and the corresponding target column is nullable (i.e., the constraint is less strict on the target).
A DEFAULT, CHECK, FOREIGN KEY, or UNIQUE constraint is specified on a target column and not on the source column.

Type mapping

If drop-on-target-and-recreate is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined as follows:

PostgreSQL types are mapped to existing CockroachDB types that have the same OID.

The following MySQL types are mapped to corresponding CockroachDB types:

MySQL type	CockroachDB type
`CHAR`, `CHARACTER`, `VARCHAR`, `NCHAR`, `NVARCHAR`	`VARCHAR`
`TINYTEXT`, `TEXT`, `MEDIUMTEXT`, `LONGTEXT`	`STRING`
`GEOMETRY`	`GEOMETRY`
`LINESTRING`	`LINESTRING`
`POINT`	`POINT`
`POLYGON`	`POLYGON`
`MULTIPOINT`	`MULTIPOINT`
`MULTILINESTRING`	`MULTILINESTRING`
`MULTIPOLYGON`	`MULTIPOLYGON`
`GEOMETRYCOLLECTION`, `GEOMCOLLECTION`	`GEOMETRYCOLLECTION`
`JSON`	`JSONB`
`TINYINT`, `INT1`	`INT2`
`BLOB`	`BYTES`
`SMALLINT`, `INT2`	`INT2`
`MEDIUMINT`, `INT`, `INTEGER`, `INT4`	`INT4`
`BIGINT`, `INT8`	`INT`
`FLOAT`	`FLOAT4`
`DOUBLE`	`FLOAT`
`DECIMAL`, `NUMERIC`, `REAL`	`DECIMAL` (Negative scale values are autocorrected to `0`)
`BINARY`, `VARBINARY`	`BYTES`
`DATETIME`	`TIMESTAMP`
`TIMESTAMP`	`TIMESTAMPTZ`
`TIME`	`TIME`
`BIT`	`VARBIT`
`DATE`	`DATE`
`TINYBLOB`, `MEDIUMBLOB`, `LONGBLOB`	`BYTES`
`BOOL`, `BOOLEAN`	`BOOL`
`ENUM`	`ANY_ENUM`

To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the --type-map-file flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and Verify:
- TIMESTAMP <> TIMESTAMPTZ
- VARCHAR <> UUID
- BOOL <> INT2
- VARBIT <> TEXT
- VARBIT <> BOOL
- JSONB <> TEXT
- INET <> TEXT

--type-map-file specifies the path to the JSON file containing the explicit type mappings. For example:

--type-map-file 'type-mappings.json'

The following JSON example defines two type mappings:

 [
  {
    "table": "public.t1",
    "column_type_map": [
      {
        "column": "*",
        "source_type": "int",
        "crdb_type": "INT2"
      },
      {
        "column": "name",
        "source_type": "varbit",
        "crdb_type": "string"
      }
    ]
  }
]

table specifies the table that will use the custom type mappings in column_type_map. The value is written as {schema}.{table}.
column specifies the column that will use the custom type mapping. If * is specified, then all columns in the table with the matching source_type are converted.
source_type specifies the source type to be mapped.
crdb_type specifies the target CockroachDB type to be mapped.

Transformations

You can define transformation rules to be performed on the target schema during the fetch task. These can be used to:

Map computed columns to a target schema.
Map partitioned tables to a single target table.
Rename tables on the target schema.

Transformation rules are defined in the JSON file indicated by the --transformations-file flag. For example:

--transformations-file 'transformation-rules.json'

The following JSON example defines two transformation rules:

{
  "transforms": [
    {
      "id": 1,
      "resource_specifier": {
        "schema": ".*",
        "table": ".*"
      },
      "column_exclusion_opts": {
        "add_computed_def": true,
        "column": "^age$"
      }
    },
    {
      "id": 2,
      "resource_specifier": {
        "schema": "public",
        "table": "charges_part.*"
      },
      "table_rename_opts": {
        "value": "charges"
      }
    }
  ]
}

resource_specifier configures the following options for transformation rules:
- schema specifies the schemas to be affected by the transformation rule, formatted as a POSIX regex string.
- table specifies the tables to be affected by the transformation rule, formatted as a POSIX regex string.
column_exclusion_opts configures the following options for column exclusions and computed columns:
- column specifies source columns to exclude from being mapped to regular columns on the target schema. It is formatted as a POSIX regex string.
- add_computed_def, when set to true, specifies that each matching column should be mapped to a computed column on the target schema. Instead of being moved from the source, the column data is generated on the target using ALTER TABLE ... ADD COLUMN and the computed column definition from the source schema. This assumes that all matching columns are computed columns on the source.
  Warning:
  Columns that match the column regex will not be moved to CockroachDB if add_computed_def is omitted or set to false (default), or if a matching column is a non-computed column.
table_rename_opts configures the following option for table renaming:
- value specifies the table name to which the matching resource_specifier is mapped. If only one source table matches resource_specifier, it is renamed to table_rename_opts.value on the target. If more than one table matches resource_specifier (i.e., an n-to-1 mapping), the fetch task assumes that all matching tables are partitioned tables with the same schema, and moves their data to a table named table_rename_opts.value on the target. Otherwise, the task will error.
  
  Additionally, in an n-to-1 mapping situation:
  - Specify --use-copy or --direct-copy for data movement. This is because the data from the source tables is loaded concurrently into the target table.
  - Create the target table schema manually, and do not use --table-handling drop-on-target-and-recreate for target table handling.

The preceding JSON example therefore defines two rules:

Rule 1 maps all source age columns on the source database to computed columns on CockroachDB. This assumes that all matching age columns are defined as computed columns on the source.
Rule 2 maps all table names with prefix charges_part from the source database to a single charges table on CockroachDB (i.e., an n-to-1 mapping). This assumes that all matching charges_part.* tables have the same schema.

Each rule is applied in the order it is defined. If two rules overlap, the later rule will override the earlier rule.

To verify that the logging shows that the computed columns are being created:

When running molt fetch, set --logging debug and look for ALTER TABLE ... ADD COLUMN statements with the STORED or VIRTUAL keywords in the log output:

{"level":"debug","time":"2024-07-22T12:01:51-04:00","message":"running: ALTER TABLE IF EXISTS public.computed ADD COLUMN computed_col INT8 NOT NULL AS ((col1 + col2)) STORED"}

After running molt fetch, issue a SHOW CREATE TABLE statement on CockroachDB:

SHOW CREATE TABLE computed;

  table_name |                         create_statement
-------------+-------------------------------------------------------------------
  computed   | CREATE TABLE public.computed (
  ...
             |     computed_col INT8 NOT NULL AS (col1 + col2) STORED
             | )

Fetch continuation

If MOLT Fetch fails while loading data into CockroachDB from intermediate files, it exits with an error message, fetch ID, and continuation token for each table that failed to load on the target database. You can use this information to continue the task from the continuation point where it was interrupted. For an example, see Continue fetch after encountering an error.

Continuation is only possible under the following conditions:

All data has been exported from the source database into intermediate files on cloud or local storage.
The initial load of source data into the target CockroachDB database is incomplete. This means that ongoing replication of source data has not begun.

Note:

Only one fetch ID and set of continuation tokens, each token corresponding to a table, are active at any time. See List active continuation tokens.

To retry all data starting from the continuation point, reissue the molt fetch command and include the --fetch-id.

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007

To retry a specific table that failed, include both --fetch-id and --continuation-token. The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the molt fetch output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See List active continuation tokens.

Note:

This will retry only the table that corresponds to the continuation token. If the fetch task succeeds, there may still be source data that is not yet loaded into CockroachDB.

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007
--continuation-token 011762e5-6f70-43f8-8e15-58b4de10a007

To retry all data starting from a specific file, include both --fetch-id and --continuation-file-name. The latter flag specifies the filename of an intermediate file in cloud or local storage. All filenames are prepended with part_ and have the .csv.gz or .csv extension, depending on compression type (gzip by default). For example:

--fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007
--continuation-file-name part_00000003.csv.gz

Note:

Continuation is not possible when using direct copy.

List active continuation tokens

To view all active continuation tokens, issue a molt fetch tokens list command along with --conn-string, which specifies the connection string for the target CockroachDB database. For example:

molt fetch tokens list \
--conn-string 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full'

+--------------------------------------+--------------------------------------+------------------+----------------------+
|                  ID                  |               FETCH ID               |    TABLE NAME    |      FILE NAME       |
+--------------------------------------+--------------------------------------+------------------+----------------------+
| f6f0284c-d9c1-43c9-8fde-af609d0dbd82 | 66443597-5689-4df3-a7b9-9fc5e27180eb | public.employees | part_00000001.csv.gz |
+--------------------------------------+--------------------------------------+------------------+----------------------+
Continuation Tokens.

CDC cursor

A change data capture (CDC) cursor is written to the output as cdc_cursor at the beginning and end of the fetch task. For example:

{"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":1,"tables":["public.employees"],"cdc_cursor":"0/3F41E40","net_duration_ms":4879.890041,"net_duration":"000h 00m 04s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"}

You can use the cdc_cursor value with an external change data capture (CDC) tool to continuously replicate subsequent changes on the source data to CockroachDB.

Metrics

By default, MOLT Fetch exports Prometheus metrics at 127.0.0.1:3030/metrics. You can configure this endpoint with the --metrics-listen-addr flag.

Cockroach Labs recommends monitoring the following metrics:

Metric Name	Description
`molt_fetch_num_tables`	Number of tables that will be moved from the source.
`molt_fetch_num_task_errors`	Number of errors encountered by the fetch task.
`molt_fetch_overall_duration`	Duration (in seconds) of the fetch task.
`molt_fetch_rows_exported`	Number of rows that have been exported from a table. For example: `molt_fetch_rows_exported{table="public.users"}`
`molt_fetch_rows_imported`	Number of rows that have been imported from a table. For example: `molt_fetch_rows_imported{table="public.users"}`
`molt_fetch_table_export_duration_ms`	Duration (in milliseconds) of a table's export. For example: `molt_fetch_table_export_duration_ms{table="public.users"}`
`molt_fetch_table_import_duration_ms`	Duration (in milliseconds) of a table's import. For example: `molt_fetch_table_import_duration_ms{table="public.users"}`

You can also use the sample Grafana dashboard to view the preceding metrics.

Docker usage

For details on pulling Docker images, see Docker image.

Performance

MOLT Fetch and Verify are likely to run more slowly in a Docker container than on a local machine. To improve performance, increase the memory or compute resources, or both, on your Docker container.

Authentication

When using MOLT Fetch with cloud storage, it is necessary to specify volumes and environment variables, as described in the following sections for Google Cloud Storage and Amazon S3.

No additional configuration is needed when running MOLT Fetch with a local file server or in direct copy mode:

docker run -it cockroachdb/molt fetch ...

For more information on docker run, see the Docker documentation.

Google Cloud Storage

If you are using Google Cloud Storage for cloud storage:

Volume map the application_default_credentials.json file into the container, and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to this file.

docker run \
  -v ~/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
  -it \
  cockroachdb/molt fetch ...

In case the previous authentication method fails, you can volume map the entire Google Cloud configuration directory into the container. In addition to setting the GOOGLE_APPLICATION_CREDENTIALS environment variable, set CLOUDSDK_CONFIG to point to the configuration directory:

docker run \
  -v ~/.config/gcloud:/gcp/config:ro \
  -e CLOUDSDK_CONFIG=/gcp/config \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/config/application_default_credentials.json \
  -it \
  cockroachdb/molt fetch ...

For details on Google Cloud Storage authentication, see Application Default Credentials.

Amazon S3

If you are using Amazon S3 for cloud storage:

Volume map the host's ~/.aws directory to the /root/.aws directory inside the container, and set the required AWS_REGION, AWS_SECRET_ACCESS_KEY, and AWS_ACCESS_KEY_ID environment variables:

docker run \
  -v ~/.aws:/root/.aws \
  -e AWS_REGION=your-region \
  -e AWS_SECRET_ACCESS_KEY=your-secret-access-key \
  -e AWS_ACCESS_KEY_ID=your-access-key-id \
  -it \
  cockroachdb/molt fetch ...

Local connection strings

When testing locally, specify the host as follows:

For macOS, use host.docker.internal. For example:

--source 'postgres://postgres:postgres@host.docker.internal:5432/molt?sslmode=disable'
--target "postgres://root@host.docker.internal:26257/molt?sslmode=disable"

For Linux and Windows, use 172.17.0.1. For example:

--source 'postgres://postgres:postgres@172.17.0.1:5432/molt?sslmode=disable'
--target "postgres://root@172.17.0.1:26257/molt?sslmode=disable"

Examples

The following examples demonstrate how to issue molt fetch commands to load data into CockroachDB. These examples assume that secure connections to the source and target database are used.

Tip:

After successfully running MOLT Fetch, you can run molt verify to confirm that replication worked successfully without missing or mismatched rows.

Load PostgreSQL data via S3 with continuous replication

The following molt fetch command uses IMPORT INTO to load a subset of tables from a PostgreSQL database to CockroachDB.

molt fetch \
--source 'postgres://postgres:postgres@localhost/molt' \
--target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
--table-handling 'truncate-if-exists' \
--table-filter 'employees' \
--bucket-path 's3://migration/data/cockroach' \
--cleanup \
--pglogical-replication-slot-name 'replication_slot' \
--mode data-load-and-replication

--table-handling specifies that existing tables on CockroachDB should be truncated before the source data is loaded.
--table-filter filters for tables with the employees string in the name.
--bucket-path specifies a directory on an Amazon S3 bucket where intermediate files will be written.
--cleanup specifies that the intermediate files should be removed after the source data is loaded.
--pglogical-replication-slot-name specifies a replication slot name to be created on the source PostgreSQL database. This is used in continuous replication.
--mode data-load-and-replication starts continuous replication of data from the source database to CockroachDB after the fetch task succeeds.

If the fetch task succeeds, the output displays a fetch complete message like the following:

{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"0/3F41E40","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}

Note:

If the fetch task encounters an error, it will exit and can be continued.

Continuous replication begins immediately afterward:

{"level":"info","time":"2024-05-13T14:33:07-04:00","message":"starting replicator"}
{"level":"info","time":"2024-05-13T14:36:22-04:00","message":"creating publication"}

To cancel replication, enter ctrl-c to issue a SIGTERM signal.

Load MySQL data via GCP with continuous replication

The following molt fetch command uses COPY FROM to load a subset of tables from a MySQL database to CockroachDB.

molt fetch \
--source 'mysql://root:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \
--target 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
--table-handling 'truncate-if-exists' \
--table-filter 'employees' \
--bucket-path 'gs://migration/data/cockroach' \
--use-copy \
--cleanup \
--mode data-load-and-replication

--source specifies the MySQL connection string and the certificates in URL-encoded format. Secure connections should be used by default. Refer to Setup.
--table-handling specifies that existing tables on CockroachDB should be truncated before the source data is loaded.
--table-filter filters for tables with the employees string in the name.
--bucket-path specifies a directory on an Google Cloud Storage bucket where intermediate files will be written.
--use-copy specifies that COPY FROM is used to load the tables, keeping the source tables online and queryable but loading the data more slowly than IMPORT INTO.
--cleanup specifies that the intermediate files should be removed after the source data is loaded.
--mode data-load-and-replication starts continuous replication of data from the source database to CockroachDB after the fetch task succeeds.

If the fetch task succeeds, the output displays a fetch complete message like the following:

{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"0/3F41E40","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}

Note:

If the fetch task encounters an error, it will exit and can be continued.

Continuous replication begins immediately afterward:

{"level":"info","time":"2024-05-13T14:33:07-04:00","message":"starting replicator"}

To cancel replication, enter ctrl-c to issue a SIGTERM signal.

Load CockroachDB data via direct copy

The following molt fetch command uses COPY FROM to load all tables directly from one CockroachDB database to another.

molt fetch \
--source 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
--target 'postgres://root@localhost:26258/defaultdb?sslmode=disable' \
--table-handling 'none' \
--direct-copy \
--allow-tls-mode-disable

--source specifies sslmode=disable to establish an insecure connection. By default, insecure connections are disallowed and should be used only for testing or if a secure SSL/TLS connection to the source or target database is not possible.
--table-handling specifies that existing tables on the target CockroachDB database should not be modified before the source data is loaded.
--direct-copy specifies that COPY FROM is used to load the tables directly, without creating intermediate files.
--allow-tls-mode-disable enables insecure connections to the source and target databases. Refer to Secure connections.

Continue fetch after encountering an error

If the fetch task encounters an error, it exits with an error message, fetch ID, and continuation token for each table that failed to load on the target database. You can use these values to continue the fetch task from where it was interrupted.

{"level":"info","table":"public.tbl1","file_name":"shard_01_part_00000001.csv","message":"creating or updating token for duplicate key value violates unique constraint \"tbl1_pkey\"; Key (id)=(22) already exists."}
{"level":"info","table":"public.tbl1","continuation_token":"5e7c7173-101c-4539-9b8d-28fad37d0240","message":"created continuation token"}
{"level":"info","fetch_id":"87bf8dc0-803c-4e26-89d5-3352576f92a7","message":"continue from this fetch ID"}

To retry a specific table, reissue the initial molt fetch command and include the fetch ID and a continuation token:

Tip:

You can use molt fetch tokens list to list all active continuation tokens. Refer to List active continuation tokens.

molt fetch \
... \
--fetch-id '87bf8dc0-803c-4e26-89d5-3352576f92a7' \
--continuation-token '5e7c7173-101c-4539-9b8d-28fad37d0240'

To retry all tables that failed, exclude --continuation-token from the command. When prompted, type y to clear all active continuation tokens. To avoid the prompt (e.g., when running molt fetch in a job), include the --non-interactive flag:

molt fetch \
... \
--fetch-id '87bf8dc0-803c-4e26-89d5-3352576f92a7' \
--non-interactive

Fail back securely from CockroachDB

Warning:

The following molt fetch command uses failback mode to securely replicate changes from CockroachDB back to a MySQL database. This assumes that you migrated data from MySQL to CockroachDB, and want to keep the data consistent on MySQL in case you need to roll back the migration.

molt fetch \
--source 'postgres://root@localhost:26257/defaultdb?sslmode=verify-full' \
--target 'mysql://root:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' \
--table-filter 'employees|payments' \
--non-interactive \
--logging debug \
--replicator-flags "--tlsCertificate ./certs/server.crt --tlsPrivateKey ./certs/server.key" \
--mode failback \
--changefeeds-path 'changefeed-secure.json'

--source specifies the connection string of the CockroachDB database to which you migrated.
--target specifies the connection string of the MySQL database acting as the failback target.
--table-filter specifies that the employees and payments tables should be watched for change events.
--replicator-flags specifies the paths to the server certificate (--tlsCertificate) and key (--tlsPrivateKey) that correspond to the client certs defined by sink_query_parameters in the changefeed override JSON file.
--changefeeds-path specifies the path to changefeed-secure.json, which contains the following setting override:
```
{
    "sink_query_parameters": "client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}"
}
```
client_cert, client_key, and ca_cert are webhook sink parameters that must be base64- and URL-encoded (for example, use the command base64 -i ./client.crt | jq -R -r '@uri').

Tip:

For details on the default changefeed settings and how to override them, see Changefeed override settings.

The preceding molt fetch command issues the equivalent CREATE CHANGEFEED command, using the default and explicitly overriden changefeed settings:

CREATE CHANGEFEED FOR TABLE employees, payments 
  INTO 'webhook-https://localhost:30004/defaultdb/public?client_cert={base64 cert}&client_key={base64 key}&ca_cert={base64 CA cert}' 
  WITH updated, resolved = '1s', min_checkpoint_frequency = '1s', initial_scan = 'no', cursor = '2024-09-11T16:33:35Z', webhook_sink_config = '{\"Flush\":{\"Bytes\":1048576,\"Frequency\":\"1s\"}}'

The initial output looks like the following:

INFO   [Sep 11 11:03:54] Replicator starting                           -buildmode=exe -compiler=gc CGO_CFLAGS= CGO_CPPFLAGS= CGO_CXXFLAGS= CGO_ENABLED=1 CGO_LDFLAGS= GOARCH=arm64 GOOS=darwin vcs=git vcs.modified=true vcs.revision=c948b78081a37aacf37a82eac213aa91a2828f92 vcs.time="2024-08-19T13:39:37Z"
INFO   [Sep 11 11:03:54] Server listening                              address="[::]:30004"
DEBUG  [Sep 11 11:04:00]                                               httpRequest="&{0x14000156ea0 0 401 32 101.042µs   false false}"
DEBUG  [Sep 11 11:04:00]                                               httpRequest="&{0x14000018b40 0 401 32 104.417µs   false false}"
DEBUG  [Sep 11 11:04:01]                                               httpRequest="&{0x140000190e0 0 401 32 27.958µs   false false}"

MOLT Fetch

Supported databases

Installation

Docker image

Setup

Best practices

Security recommendations

Secure connections

Connection strings

Secure cloud storage

Perform a dry run

Commands

Subcommands

Flags

Global flags

tokens list flags

Replication flags

PostgreSQL replication flags

MySQL replication flags

Failback replication flags

Usage

Source and target databases

Fetch mode

Load data

Load data and replicate changes

Replicate changes

Resume replication

Export data to storage

Import data from storage

Fail back to source database

Changefeed override settings

Default insecure changefeed

Data movement

Data path

Cloud storage

Local file server

Direct copy

Schema and table selection

Target table handling

Mismatch handling

Type mapping

Transformations

Fetch continuation

List active continuation tokens

CDC cursor

Metrics

Docker usage

Performance

Authentication

Google Cloud Storage

Amazon S3

Local connection strings

Examples

Load PostgreSQL data via S3 with continuous replication

Load MySQL data via GCP with continuous replication

Load CockroachDB data via direct copy

Continue fetch after encountering an error

Fail back securely from CockroachDB

See also

`tokens list` flags