Recovery Decision Tree
Use the following flow to determine the correct recovery action for your situation.Pipeline is in Broken status
- Go to the Logs page and filter by the affected connector at the ERROR level
- Identify the error type:
- Connection error (network timeout, authentication failure) — fix connectivity, then restart the connector via the pipeline actions menu
- Schema error (type mismatch, missing column) — fix the schema at source or destination, then monitor for recovery
- Permission error (access denied, insufficient privileges) — grant required permissions, then restart the connector
- Resource error (disk full, memory exhaustion) — free resources or scale infrastructure, then restart
- If the error persists after restarting, consider a stop and resume cycle or contact Streamkap support
Pipeline is running but data is not flowing
- Check the pipeline’s Lag metric — if lag is increasing, the source is producing but the destination is not consuming
- Check the Consumer Groups page for the destination’s consumer group:
- Is the consumer group in
STABLEstate with active members? - Is consumer lag growing or static?
- Is the consumer group in
- Check the source connector status — is it Active or Broken?
- Check the Logs page for WARN or ERROR messages from either connector
- If the source connector is healthy but the destination is not consuming, stop and resume the destination connector
Data at destination does not match source
- Check the DLQ (Dead Letter Queue) for failed messages — schema mismatches or constraint violations can cause records to be diverted
- Review the Logs for any processing errors or warnings
- If specific records are missing, consider a filtered (partial) snapshot for the affected time range
- If broad data discrepancy is detected, consider a full snapshot of the affected tables
- For Snowflake destinations in append mode, verify that both consumer group offsets and Snowpipe Streaming channel offsets are aligned — see Snowflake Offset Management
Recovery Actions
The following table summarizes the available recovery actions, when to use each, and their impact on data.| Action | What It Does | When to Use | Data Impact |
|---|---|---|---|
| Restart connector | Stops and immediately restarts the source or destination connector | Transient errors, connection timeouts, minor configuration changes | None — resumes from last committed offset |
| Stop + Resume | Manually stops a connector, then resumes it after a pause | Persistent errors that need time to resolve (e.g., waiting for infrastructure fixes) | None — resumes from last committed offset |
| Reset consumer group offsets | Changes the consumer group’s offset position (earliest, latest, specific timestamp, or specific offset) | Need to replay messages, skip problematic records, or recover from a known-good position | May cause data duplication (if reset to earlier offset) or data loss (if reset to later offset) |
| Snapshot | Triggers a full or filtered snapshot to backfill historical data from source | Missing data at destination, post-schema-change backfill, initial data load for new tables | Destination receives snapshot data in addition to ongoing CDC; no data loss but may temporarily increase latency and lag |
Recovery Scenarios
Pipeline shows FAILED / Broken status
Pipeline shows FAILED / Broken status
- Pipeline status shows Broken (red badge) in the Pipelines list
- An info icon appears next to the status with error details
- Click the info icon next to the Broken status to see the error summary
- Navigate to the Logs page, filter by the affected connector, and set the log level to ERROR
- Expand error messages to view stack traces and identify the root cause
| Cause | Resolution |
|---|---|
| Source database connection lost | Verify network connectivity, firewall rules, and SSH tunnel/VPN status. Restart the source connector once connectivity is restored. |
| Destination authentication failure | Check that destination credentials are valid and have not been rotated. Update connector settings if needed. |
| Schema incompatibility | Compare source and destination schemas. Fix mismatches and restart the connector. See DLQ for failed messages. |
| Resource exhaustion on source or destination | Free disk space, increase memory, or scale the database. Restart the connector after resources are available. |
- Fix the underlying issue, then use Resume source or Resume destination from the pipeline’s row actions menu
- If the connector does not recover after resuming, stop it, wait 30 seconds, and resume again
Pipeline is running but no data is flowing
Pipeline is running but no data is flowing
- Pipeline status shows Active but lag is not decreasing or latency is not updating
- Destination tables are not receiving new records
- Consumer group lag is static or growing
- Check the pipeline’s Lag and Latency metrics on the Pipelines page
- Navigate to Consumer Groups and locate the destination’s consumer group
- Verify the group is in
STABLEstate - Check the Total Lag metric and per-partition consumer lag
- Verify the group is in
- Check the source connector status — is it still streaming? Look for recent INFO-level log messages
- Review Logs for WARN or ERROR messages from both source and destination connectors
| Cause | Resolution |
|---|---|
| Source connector stopped or paused | Resume the source connector from the pipeline actions menu |
| Destination connector stuck | Stop and resume the destination connector |
| Consumer group has no active members | Verify the destination connector is running; restart if the consumer group shows EMPTY state |
| Network issue between components | Check network connectivity; review firewall and security group rules |
| Source database has no new changes | Confirm that changes are being made to the source tables; this may be expected behavior during idle periods |
Data at destination does not match source
Data at destination does not match source
- Row counts differ between source and destination
- Specific records are missing or outdated at the destination
- Column values differ between source and destination
- Check the DLQ for messages that failed delivery — these records were diverted instead of being written to the destination
- Review the Logs page for schema errors, type conversion warnings, or constraint violations
- Verify that the destination schema matches the source schema (column types, nullable constraints, primary keys)
- Check if a snapshot was recently cancelled or failed — incomplete snapshots can leave gaps in historical data
| Cause | Resolution |
|---|---|
| Messages in the DLQ | Fix the root cause (schema mismatch, permission error, size limit), then snapshot affected tables to backfill missing records. See DLQ Recovery. |
| Incomplete or failed snapshot | Re-trigger a snapshot for the affected tables. See Failed Snapshot Recovery. |
| Schema change without snapshot | Trigger a snapshot to backfill historical rows with the new schema. See Snapshotting After Schema Changes. |
| Consumer group offset desynchronized | Reset consumer group offsets to replay missed messages. Follow the offset reset procedure below. |
Pipeline performance degraded
Pipeline performance degraded
- Pipeline latency is significantly higher than normal
- Consumer lag is growing steadily
- Data arrives at the destination with increasing delay
- Check the pipeline’s Lag and Latency on the Pipelines page
- Navigate to Consumer Groups and check per-partition lag to identify bottlenecks
- Review Logs for slow query warnings or timeout messages
- Check if a snapshot is currently running — snapshots increase load and may temporarily degrade performance
| Cause | Resolution |
|---|---|
| Active snapshot in progress | This is expected. Wait for the snapshot to complete. Lag and latency will normalize after backfilling finishes. |
| Insufficient consumer parallelism | Increase the Tasks setting on the destination connector to add more parallel consumers. See Consumer Groups - Performance Tuning. |
| Destination write bottleneck | Check destination database performance. Consider scaling destination resources or optimizing table indexes. |
| High source database load | Schedule snapshots during off-peak hours. Review source database query performance. |
| Partition count too low | Increase topic partition count to enable higher parallelism. See Topics. |
MongoDB ChangeStreamFatalError — resume token expired
MongoDB ChangeStreamFatalError — resume token expired
- Pipeline shows Broken status with
ChangeStreamFatalErrorin logs - Connector cannot resume from its last position
- Error 280 appears in connector logs
- Check the Logs page for
ChangeStreamFatalErrormessages - Verify the MongoDB oplog retention period — if the connector was offline longer than the oplog retention, the resume token has expired
- Contact Streamkap support for an offset reset
- After the offset reset, trigger a snapshot to backfill any data missed during the outage
- Increase oplog retention to 48 hours or more to prevent recurrence:
MySQL schema recovery — 'Schema isn't known to this connector'
MySQL schema recovery — 'Schema isn't known to this connector'
- Pipeline shows Broken status with schema-related errors
- Error message: “Schema isn’t known to this connector”
- Typically affects databases with 1000+ tables
- Check the Logs page for schema-related error messages
- Verify the number of tables in the source database — this issue is more common with large database instances
- Contact Streamkap support for schema history recovery
- After recovery, trigger a snapshot of affected tables to ensure data consistency
- Consider enabling Capture Only Captured Tables DDL in the source’s Advanced settings to prevent recurrence
Pipeline stuck after source database maintenance
Pipeline stuck after source database maintenance
- Pipeline was working before a source database restart, failover, or maintenance window
- Pipeline shows Broken status or is running but no new data is arriving
- Logs show connection errors or “connection refused” messages
- Confirm that the source database is back online and accepting connections
- Verify that the database user credentials and network configuration have not changed
- Check Logs for connection error messages from the source connector
- For PostgreSQL sources, verify that the replication slot still exists and has not been dropped during maintenance
- For MySQL sources, verify that binary logging is still enabled and the binlog has not been purged past the connector’s position
- Verify source database connectivity and credentials
- Resume the source connector from the pipeline actions menu
- If the source connector does not recover:
- Stop the source connector
- Wait for the source database to be fully available
- Resume the source connector
- If the connector reports that its position (replication slot, binlog position) is no longer valid:
- The source connector may need to be reconfigured
- Trigger a snapshot to re-establish the data baseline
- Contact Streamkap support if the issue persists
Step-by-Step Recovery Procedures
How to restart a pipeline connector
Navigate to the Pipelines page
Stop the connector
Wait briefly
Verify recovery
How to reset consumer group offsets
Stop the destination connector
Navigate to the consumer group
Select partitions to reset
Choose a reset strategy
- Earliest — replay all available messages from the beginning of the retention window
- Latest — skip to the end and only process new messages going forward
- Specific Timestamp — reset to the first offset after a given timestamp
- Specific Offset — set a precise offset position
Handle Snowflake destinations (if applicable)
-1. See Snowflake Offset Management for the required SQL commands.How to trigger a snapshot
Navigate to the source connector
Choose snapshot scope
- Source level: Use the source connector’s actions menu and select Snapshot
- Table level: Find the specific topic in the Topics list and use its actions menu to select Snapshot
Select snapshot type
- Full (Complete) — captures all rows from the selected tables
- Filtered (Partial) — captures only rows matching a filter condition (useful for backfilling specific time ranges)
Monitor progress
Related Documentation
- Pipelines - Monitor pipeline health, manage connectors, and view performance metrics
- Consumer Groups - Monitor consumer lag, inspect members, and reset offsets
- Snapshots & Backfilling - Trigger and manage snapshots for data backfilling
- Dead Letter Queue (DLQ) - Inspect and resolve messages that failed processing