Snapshots & Backfilling
Streamkap can perform snapshots to backfill your data.
Please ensure you've followed your connector setup guide to enable this process.
Snapshot Behaviour
Streamkap performs an incremental snapshot on your tables.
- This process involves a looping statement that reads a block of rows (Default 1024) at a time. This keeps the impact extremely low but can be sped up.
- Streaming new data from the log is carried out concurrently.
- It can survive interruptions such as a database restart whereby it will resume from the last snapshot position.
- Data read in is compared with streamed data to ensure we always maintain the latest row.
When do we snapshot?
Creating a Pipeline
- When you create a pipeline connecting the source to a destination there is a toggle defaulted to ON to take a snapshot of the tables being added in the pipeline
Post Connector & Pipeline Creation
- When you add topics/tables to the pipeline, the default behaviour is to take a snapshot of newly added tables.
- You can trigger snaphot on an ad-hoc basis by drilling into a source and clicking snapshot at connector level or you can also trigger at topic level.
How to Trigger an Adhoc Snapshot
It's possible to trigger a snapshot via the connector or at topic level.
Connector
This will trigger a snapshot across all tables/topics.
Table/Topic
This will carry out a snapshot only for the selected table/topic.
Upon triggering a snapshot, the status will update
It is possible to cancel the snapshot while it's in pending status
Performance
Each connector has a setting called Snapshot Chunk Size.
This represents the number of rows that are fetched upon request which will default at 1024 rows. Increasing this value will linearly increase the speed of the snapshots.
Updated 5 days ago