Pipeline Recovery Procedures

This guide helps you diagnose and resolve pipeline issues in Streamkap. Use it when a pipeline is in a Broken state, data has stopped flowing, or you observe data discrepancies between source and destination. For routine pipeline management, see Pipelines. For proactive monitoring, see Alerts.

Recovery Decision Tree

Use the following flow to determine the correct recovery action for your situation.

Pipeline is in Broken status

Go to the Logs page and filter by the affected connector at the ERROR level
Identify the error type:
- Connection error (network timeout, authentication failure) — fix connectivity, then restart the connector via the pipeline actions menu
- Schema error (type mismatch, missing column) — fix the schema at source or destination, then monitor for recovery
- Permission error (access denied, insufficient privileges) — grant required permissions, then restart the connector
- Resource error (disk full, memory exhaustion) — free resources or scale infrastructure, then restart
If the error persists after restarting, consider a stop and resume cycle or contact Streamkap support

Pipeline is running but data is not flowing

Check the pipeline’s Lag metric — if lag is increasing, the source is producing but the destination is not consuming
Check the Consumer Groups page for the destination’s consumer group:
- Is the consumer group in STABLE state with active members?
- Is consumer lag growing or static?
Check the source connector status — is it Active or Broken?
Check the Logs page for WARN or ERROR messages from either connector
If the source connector is healthy but the destination is not consuming, stop and resume the destination connector

Data at destination does not match source

Check the DLQ (Dead Letter Queue) for failed messages — schema mismatches or constraint violations can cause records to be diverted
Review the Logs for any processing errors or warnings
If specific records are missing, consider a filtered (partial) snapshot for the affected time range
If broad data discrepancy is detected, consider a full snapshot of the affected tables
For Snowflake destinations in append mode, verify that both consumer group offsets and Snowpipe Streaming channel offsets are aligned — see Snowflake Offset Management

Recovery Actions

The following table summarizes the available recovery actions, when to use each, and their impact on data.

Action	What It Does	When to Use	Data Impact
Restart connector	Stops and immediately restarts the source or destination connector	Transient errors, connection timeouts, minor configuration changes	None — resumes from last committed offset
Stop + Resume	Manually stops a connector, then resumes it after a pause	Persistent errors that need time to resolve (e.g., waiting for infrastructure fixes)	None — resumes from last committed offset
Reset consumer group offsets	Changes the consumer group’s offset position (earliest, latest, specific timestamp, or specific offset)	Need to replay messages, skip problematic records, or recover from a known-good position	May cause data duplication (if reset to earlier offset) or data loss (if reset to later offset)
Snapshot	Triggers a full or filtered snapshot to backfill historical data from source	Missing data at destination, post-schema-change backfill, initial data load for new tables	Destination receives snapshot data in addition to ongoing CDC; no data loss but may temporarily increase latency and lag

Resetting consumer group offsets and snapshotting are powerful operations. Always verify the impact on your destination before proceeding, especially in production environments.

Recovery Scenarios

Pipeline shows FAILED / Broken status

Symptoms:

Pipeline status shows Broken (red badge) in the Pipelines list
An info icon appears next to the status with error details

Diagnosis:

Click the info icon next to the Broken status to see the error summary
Navigate to the Logs page, filter by the affected connector, and set the log level to ERROR
Expand error messages to view stack traces and identify the root cause

Common causes and resolutions:

Cause	Resolution
Source database connection lost	Verify network connectivity, firewall rules, and SSH tunnel/VPN status. Restart the source connector once connectivity is restored.
Destination authentication failure	Check that destination credentials are valid and have not been rotated. Update connector settings if needed.
Schema incompatibility	Compare source and destination schemas. Fix mismatches and restart the connector. See DLQ for failed messages.
Resource exhaustion on source or destination	Free disk space, increase memory, or scale the database. Restart the connector after resources are available.

Recovery:

Fix the underlying issue, then use Resume source or Resume destination from the pipeline’s row actions menu
If the connector does not recover after resuming, stop it, wait 30 seconds, and resume again

Pipeline is running but no data is flowing

Symptoms:

Pipeline status shows Active but lag is not decreasing or latency is not updating
Destination tables are not receiving new records
Consumer group lag is static or growing

Diagnosis:

Check the pipeline’s Lag and Latency metrics on the Pipelines page
Navigate to Consumer Groups and locate the destination’s consumer group
- Verify the group is in STABLE state
- Check the Total Lag metric and per-partition consumer lag
Check the source connector status — is it still streaming? Look for recent INFO-level log messages
Review Logs for WARN or ERROR messages from both source and destination connectors

Common causes and resolutions:

Cause	Resolution
Source connector stopped or paused	Resume the source connector from the pipeline actions menu
Destination connector stuck	Stop and resume the destination connector
Consumer group has no active members	Verify the destination connector is running; restart if the consumer group shows `EMPTY` state
Network issue between components	Check network connectivity; review firewall and security group rules
Source database has no new changes	Confirm that changes are being made to the source tables; this may be expected behavior during idle periods

Data at destination does not match source

Symptoms:

Row counts differ between source and destination
Specific records are missing or outdated at the destination
Column values differ between source and destination

Diagnosis:

Check the DLQ for messages that failed delivery — these records were diverted instead of being written to the destination
Review the Logs page for schema errors, type conversion warnings, or constraint violations
Verify that the destination schema matches the source schema (column types, nullable constraints, primary keys)
Check if a snapshot was recently cancelled or failed — incomplete snapshots can leave gaps in historical data

Common causes and resolutions:

Cause	Resolution
Messages in the DLQ	Fix the root cause (schema mismatch, permission error, size limit), then snapshot affected tables to backfill missing records. See DLQ Recovery.
Incomplete or failed snapshot	Re-trigger a snapshot for the affected tables. See Failed Snapshot Recovery.
Schema change without snapshot	Trigger a snapshot to backfill historical rows with the new schema. See Snapshotting After Schema Changes.
Consumer group offset desynchronized	Reset consumer group offsets to replay missed messages. Follow the offset reset procedure below.

Pipeline performance degraded

Symptoms:

Pipeline latency is significantly higher than normal
Consumer lag is growing steadily
Data arrives at the destination with increasing delay

Diagnosis:

Check the pipeline’s Lag and Latency on the Pipelines page
Navigate to Consumer Groups and check per-partition lag to identify bottlenecks
Review Logs for slow query warnings or timeout messages
Check if a snapshot is currently running — snapshots increase load and may temporarily degrade performance

Common causes and resolutions:

Cause	Resolution
Active snapshot in progress	This is expected. Wait for the snapshot to complete. Lag and latency will normalize after backfilling finishes.
Insufficient consumer parallelism	Increase the Tasks setting on the destination connector to add more parallel consumers. See Consumer Groups - Performance Tuning.
Destination write bottleneck	Check destination database performance. Consider scaling destination resources or optimizing table indexes.
High source database load	Schedule snapshots during off-peak hours. Review source database query performance.
Partition count too low	Increase topic partition count to enable higher parallelism. See Topics.

Increased latency and lag may be expected when snapshots are running. Backfilling produces higher load than normal CDC streaming, but the load is temporary.

MongoDB ChangeStreamFatalError — resume token expired

Symptoms:

Pipeline shows Broken status with ChangeStreamFatalError in logs
Connector cannot resume from its last position
Error 280 appears in connector logs

Diagnosis:

Check the Logs page for ChangeStreamFatalError messages
Verify the MongoDB oplog retention period — if the connector was offline longer than the oplog retention, the resume token has expired

Resolution:

Contact Streamkap support for an offset reset
After the offset reset, trigger a snapshot to backfill any data missed during the outage

Increase oplog retention to 48 hours or more to prevent recurrence:

db.adminCommand({ replSetResizeOplog: 1, minRetentionHours: 48 })

MySQL schema recovery — 'Schema isn't known to this connector'

Symptoms:

Pipeline shows Broken status with schema-related errors
Error message: “Schema isn’t known to this connector”
Typically affects databases with 1000+ tables

Diagnosis:

Check the Logs page for schema-related error messages
Verify the number of tables in the source database — this issue is more common with large database instances

Resolution:

Contact Streamkap support for schema history recovery
After recovery, trigger a snapshot of affected tables to ensure data consistency
Consider enabling Capture Only Captured Tables DDL in the source’s Advanced settings to prevent recurrence

Pipeline stuck after source database maintenance

Symptoms:

Pipeline was working before a source database restart, failover, or maintenance window
Pipeline shows Broken status or is running but no new data is arriving
Logs show connection errors or “connection refused” messages

Diagnosis:

Confirm that the source database is back online and accepting connections
Verify that the database user credentials and network configuration have not changed
Check Logs for connection error messages from the source connector
For PostgreSQL sources, verify that the replication slot still exists and has not been dropped during maintenance
For MySQL sources, verify that binary logging is still enabled and the binlog has not been purged past the connector’s position

Resolution:

Verify source database connectivity and credentials
Resume the source connector from the pipeline actions menu
If the source connector does not recover:
- Stop the source connector
- Wait for the source database to be fully available
- Resume the source connector
If the connector reports that its position (replication slot, binlog position) is no longer valid:
- The source connector may need to be reconfigured
- Trigger a snapshot to re-establish the data baseline
- Contact Streamkap support if the issue persists

For PostgreSQL sources, if the replication slot was dropped during maintenance, the connector cannot resume from its previous position. A snapshot will be required to re-establish data consistency.

Step-by-Step Recovery Procedures

How to restart a pipeline connector

Navigate to the Pipelines page

Go to the Pipelines page from the sidebar and locate the affected pipeline.

Open the actions menu

Click the actions menu (three dots) on the pipeline row.

Stop the connector

Select Stop source or Stop destination depending on which connector needs to be restarted.

Wait briefly

Wait approximately 10-30 seconds for the connector to fully stop. You can verify by checking the connector status on the pipeline detail page.

Resume the connector

Open the actions menu again and select Resume source or Resume destination.

Verify recovery

Monitor the pipeline’s Status, Lag, and Latency metrics to confirm the connector has recovered. Check the Logs page for any new error messages.

Is restart different from stop + resume? In Streamkap, restarting a connector is effectively a stop followed by a resume. Both operations preserve the connector’s last committed offset position, so no data is lost. The distinction matters primarily when you need to make changes (fix permissions, update credentials, wait for infrastructure) between stopping and resuming.

How to reset consumer group offsets

Stop the destination connector

From the pipeline’s row actions menu, select Stop destination. All consumers in the group must be stopped before offsets can be reset.

Navigate to the consumer group

Go to the Consumer Groups page and find the consumer group associated with your destination connector.

Select partitions to reset

In the Topic Partitions table, check the boxes for the partitions you want to reset. You can select individual partitions or all partitions for a topic.

Click Reset Offsets

Click the Reset Offsets button. A dialog will appear with reset options.

Choose a reset strategy

Select the appropriate strategy:

Earliest — replay all available messages from the beginning of the retention window
Latest — skip to the end and only process new messages going forward
Specific Timestamp — reset to the first offset after a given timestamp
Specific Offset — set a precise offset position

Apply the reset

Review your selections and click Apply.

Handle Snowflake destinations (if applicable)

For Snowflake destinations in append mode, you must also reset the Snowpipe Streaming channel offsets to -1. See Snowflake Offset Management for the required SQL commands.

Resume the destination connector

Go back to the pipeline’s actions menu and select Resume destination. The consumers will start processing from the new offset positions.

Resetting offsets to Earliest on large topics will cause re-processing of all retained messages, which may take considerable time and could result in duplicate data at the destination. Streamkap retains topic data based on your project’s retention policy. Only messages within the retention window can be replayed — check your project settings for the configured retention period.

How to trigger a snapshot

Navigate to the source connector

Go to the source connector detail page from the pipeline detail view or the Sources page.

Choose snapshot scope

Decide whether to snapshot all tables (source-level snapshot) or specific tables (table-level snapshot):

Source level: Use the source connector’s actions menu and select Snapshot
Table level: Find the specific topic in the Topics list and use its actions menu to select Snapshot

Select snapshot type

Choose between:

Full (Complete) — captures all rows from the selected tables
Filtered (Partial) — captures only rows matching a filter condition (useful for backfilling specific time ranges)

Confirm the snapshot

Type “snapshot” in the confirmation dialog to begin the operation.

Monitor progress

The source connector status will update to reflect the snapshot operation. Monitor progress on the source connector detail page and in the Topics list.

Verify completion

Once the snapshot completes, verify that the expected data is present at the destination. Check the pipeline’s Lag metric — it should decrease as the snapshot data is processed.

Snapshots run in parallel with ongoing CDC streaming. Your real-time data flow continues uninterrupted while historical data is being backfilled. Expect temporarily increased latency and lag during snapshot operations.

Recovery operation timing varies based on data volume, source database performance, and destination write capacity. Small table snapshots typically complete in minutes; large tables (100M+ rows) may take several hours. Offset resets take effect immediately, but reprocessing time depends on the volume of messages being replayed.

Pipelines - Monitor pipeline health, manage connectors, and view performance metrics
Consumer Groups - Monitor consumer lag, inspect members, and reset offsets
Snapshots & Backfilling - Trigger and manage snapshots for data backfilling
Dead Letter Queue (DLQ) - Inspect and resolve messages that failed processing

​Recovery Decision Tree

​Pipeline is in Broken status

​Pipeline is running but data is not flowing

​Data at destination does not match source

​Recovery Actions

​Recovery Scenarios

​Step-by-Step Recovery Procedures

​How to restart a pipeline connector

​How to reset consumer group offsets

​How to trigger a snapshot

​Related Documentation

Recovery Decision Tree

Pipeline is in Broken status

Pipeline is running but data is not flowing

Data at destination does not match source

Recovery Actions

Recovery Scenarios

Step-by-Step Recovery Procedures

How to restart a pipeline connector

How to reset consumer group offsets

How to trigger a snapshot

Related Documentation