Snapshots & Backfilling

Please ensure you’ve followed the relevant connector setup guide to enable the snapshots feature

Streamkap Sources use incremental snapshots to backfill your data.

Snapshot Types

Depending on the Source type, you can choose between two types of incremental snapshots:

Full (Complete): Captures all rows from the selected tables without applying any filters. This performs a complete snapshot of the data.
Filtered (Partial): Captures only rows matching specified filter conditions. This allows you to snapshot specific data subsets using filter expressions. The filter syntax depends on your Source type:
- SQL-based Sources (PostgreSQL, MySQL, Oracle, etc.): Use SQL WHERE clause syntax
- Document/NoSQL Sources (MongoDB, DocumentDB, DynamoDB): Use JSON filter expressions

Snapshot Lifecycle

When	Behavior
At connector creation	The connector starts in streaming mode, reading any change data seen from this point onwards. No snapshots are triggered automatically.
After connector creation	You can trigger ad-hoc snapshots for any or all of the tables the connector is configured to capture. A confirmation prompt is required before the snapshot begins.
Pipeline creation and edit	You can choose to trigger snapshots for the topics the pipeline will stream to your destination. A confirmation prompt is required before the snapshot begins.

Behavior

By default, an ad-hoc snapshot will sync historical data from the first record in the source table using a process called ‘watermarking’. It will do this in small increments to mitigate the impact on the source database. This is why SELECT or READ privileges on your source tables are requested during connector setup. Snapshots run in parallel with your Connector’s streaming of latest data. This ensures your real-time CDC continues uninterrupted while historical data is being backfilled. When snapshotting multiple tables, snapshots are processed sequentially, one table at a time. Each table snapshot must complete before the next table’s snapshot begins. This ensures data consistency and minimizes load on the source database during the snapshot process. Error Handling and Recovery: If the snapshot is interrupted, it will resume from the last completed table on the next attempt. If the snapshot fails, it will not automatically resume. Once the underlying issue is resolved, you can re-trigger it at the Connector or Table level.

Snapshot Limitations: DeletionsSnapshots capture row state at the point-in-time of the snapshot operation. Deletion events are not captured during snapshots—deletions can only be processed during the streaming phase. This is because snapshots read existing rows from the source table.

A parallel snapshot feature is planned for future releases to allow multiple table snapshots to run concurrently, further reducing total snapshot time for large datasets.

Triggering a Snapshot

You can trigger an ad-hoc snapshot at the Source level or per Table from the Connector’s page.

Source Level Snapshot

This will trigger a snapshot for all tables/topics captured by the Source:

Table/Topic Level Snapshot

This will trigger a snapshot for the selected tables/topics only:

Confirmation Prompt

After initiating a snapshot, you must confirm your action by typing “snapshot” in the confirmation dialog:

Selecting Snapshot Type

When triggering a table/topic snapshot, you can choose between Full (Complete) or Filtered (Partial) options:

Select Full (Complete) to capture all rows from the selected tables
Select Filtered (Partial) to apply filter conditions to capture only specific data. Filter syntax varies by Source type:
- SQL-based Sources: Use SQL WHERE clause syntax (e.g., created_at >= '2025-01-01' AND created_at < '2025-02-01')
- Document/NoSQL Sources: Use JSON filter expressions (e.g., {"status": "active", "created_at": {"$gte": "2025-01-01", "$lt": "2025-02-01"}})

Best Practices for Filtered Snapshots

When using Filtered (Partial) snapshots, we strongly recommend:

Use closed range filters when applying comparative operators on timestamp or date fields. For example: created_at >= '2025-01-01' AND created_at < '2025-02-01'. Closed ranges ensure you capture all intended data without gaps or overlaps.
Filter on indexed or primary key fields for optimal performance. Filtering on columns that are part of your table’s indices or primary key allows the database to efficiently locate matching rows, significantly reducing the load on your source database.

Snapshot Progress

Upon triggering a snapshot, the Connector status will update to reflect the snapshot operation:

Also, the Topics list will show the snapshot status per table/topic:

It is possible to cancel the snapshot as well:

Getting Started

App

Deployment

Sources

Destinations

Transformation

Billing

Security

Rest API

Snapshot Types

Snapshot Lifecycle

Behavior