Snapshots & Backfilling

Streamkap can perform snapshots to backfill your data.

Please ensure you've followed your connector setup guide to enable this process.

Snapshot Behaviour

Streamkap performs an incremental snapshot on your tables.

  • This process involves a looping statement that reads a block of rows (Default 1024) at a time. This keeps the impact extremely low but can be sped up.
  • Streaming new data from the log is carried out concurrently.
  • It can survive interruptions such as a database restart whereby it will resume from the last snapshot position.
  • Data read in is compared with streamed data to ensure we always maintain the latest row.

When do we snapshot?

Creating a Pipeline

  • When you create a pipeline connecting the source to a destination there is a toggle defaulted to ON to take a snapshot of the tables being added in the pipeline

Post Connector & Pipeline Creation

  • When you add topics/tables to the pipeline, the default behaviour is to take a snapshot of newly added tables.
  • You can trigger snaphot on an ad-hoc basis by drilling into a source and clicking snapshot at connector level or you can also trigger at topic level.

How to Trigger an Adhoc Snapshot

It's possible to trigger a snapshot via the connector or at topic level.

Connector

This will trigger a snapshot across all tables/topics.

Table/Topic

This will carry out a snapshot only for the selected table/topic.

Upon triggering a snapshot, the status will update

It is possible to cancel the snapshot while it's in pending status

Performance

Each connector has a setting called Snapshot Chunk Size.

This represents the number of rows that are fetched upon request which will default at 1024 rows. Increasing this value will linearly increase the speed of the snapshots.