Skip to main content
This guide helps you diagnose and resolve pipeline issues in Streamkap. Use it when a pipeline is in a Broken state, data has stopped flowing, or you observe data discrepancies between source and destination. For routine pipeline management, see Pipelines. For proactive monitoring, see Alerts.

Recovery Decision Tree

Use the following flow to determine the correct recovery action for your situation.

Pipeline is in Broken status

  1. Go to the Logs page and filter by the affected connector at the ERROR level
  2. Identify the error type:
    • Connection error (network timeout, authentication failure) — fix connectivity, then restart the connector via the pipeline actions menu
    • Schema error (type mismatch, missing column) — fix the schema at source or destination, then monitor for recovery
    • Permission error (access denied, insufficient privileges) — grant required permissions, then restart the connector
    • Resource error (disk full, memory exhaustion) — free resources or scale infrastructure, then restart
  3. If the error persists after restarting, consider a stop and resume cycle or contact Streamkap support

Pipeline is running but data is not flowing

  1. Check the pipeline’s Lag metric — if lag is increasing, the source is producing but the destination is not consuming
  2. Check the Consumer Groups page for the destination’s consumer group:
    • Is the consumer group in STABLE state with active members?
    • Is consumer lag growing or static?
  3. Check the source connector status — is it Active or Broken?
  4. Check the Logs page for WARN or ERROR messages from either connector
  5. If the source connector is healthy but the destination is not consuming, stop and resume the destination connector

Data at destination does not match source

  1. Check the DLQ (Dead Letter Queue) for failed messages — schema mismatches or constraint violations can cause records to be diverted
  2. Review the Logs for any processing errors or warnings
  3. If specific records are missing, consider a filtered (partial) snapshot for the affected time range
  4. If broad data discrepancy is detected, consider a full snapshot of the affected tables
  5. For Snowflake destinations in append mode, verify that both consumer group offsets and Snowpipe Streaming channel offsets are aligned — see Snowflake Offset Management

Recovery Actions

The following table summarizes the available recovery actions, when to use each, and their impact on data.
ActionWhat It DoesWhen to UseData Impact
Restart connectorStops and immediately restarts the source or destination connectorTransient errors, connection timeouts, minor configuration changesNone — resumes from last committed offset
Stop + ResumeManually stops a connector, then resumes it after a pausePersistent errors that need time to resolve (e.g., waiting for infrastructure fixes)None — resumes from last committed offset
Reset consumer group offsetsChanges the consumer group’s offset position (earliest, latest, specific timestamp, or specific offset)Need to replay messages, skip problematic records, or recover from a known-good positionMay cause data duplication (if reset to earlier offset) or data loss (if reset to later offset)
SnapshotTriggers a full or filtered snapshot to backfill historical data from sourceMissing data at destination, post-schema-change backfill, initial data load for new tablesDestination receives snapshot data in addition to ongoing CDC; no data loss but may temporarily increase latency and lag
Resetting consumer group offsets and snapshotting are powerful operations. Always verify the impact on your destination before proceeding, especially in production environments.

Recovery Scenarios

Symptoms:
  • Pipeline status shows Broken (red badge) in the Pipelines list
  • An info icon appears next to the status with error details
Diagnosis:
  1. Click the info icon next to the Broken status to see the error summary
  2. Navigate to the Logs page, filter by the affected connector, and set the log level to ERROR
  3. Expand error messages to view stack traces and identify the root cause
Common causes and resolutions:
CauseResolution
Source database connection lostVerify network connectivity, firewall rules, and SSH tunnel/VPN status. Restart the source connector once connectivity is restored.
Destination authentication failureCheck that destination credentials are valid and have not been rotated. Update connector settings if needed.
Schema incompatibilityCompare source and destination schemas. Fix mismatches and restart the connector. See DLQ for failed messages.
Resource exhaustion on source or destinationFree disk space, increase memory, or scale the database. Restart the connector after resources are available.
Recovery:
  • Fix the underlying issue, then use Resume source or Resume destination from the pipeline’s row actions menu
  • If the connector does not recover after resuming, stop it, wait 30 seconds, and resume again
Symptoms:
  • Pipeline status shows Active but lag is not decreasing or latency is not updating
  • Destination tables are not receiving new records
  • Consumer group lag is static or growing
Diagnosis:
  1. Check the pipeline’s Lag and Latency metrics on the Pipelines page
  2. Navigate to Consumer Groups and locate the destination’s consumer group
    • Verify the group is in STABLE state
    • Check the Total Lag metric and per-partition consumer lag
  3. Check the source connector status — is it still streaming? Look for recent INFO-level log messages
  4. Review Logs for WARN or ERROR messages from both source and destination connectors
Common causes and resolutions:
CauseResolution
Source connector stopped or pausedResume the source connector from the pipeline actions menu
Destination connector stuckStop and resume the destination connector
Consumer group has no active membersVerify the destination connector is running; restart if the consumer group shows EMPTY state
Network issue between componentsCheck network connectivity; review firewall and security group rules
Source database has no new changesConfirm that changes are being made to the source tables; this may be expected behavior during idle periods
Symptoms:
  • Row counts differ between source and destination
  • Specific records are missing or outdated at the destination
  • Column values differ between source and destination
Diagnosis:
  1. Check the DLQ for messages that failed delivery — these records were diverted instead of being written to the destination
  2. Review the Logs page for schema errors, type conversion warnings, or constraint violations
  3. Verify that the destination schema matches the source schema (column types, nullable constraints, primary keys)
  4. Check if a snapshot was recently cancelled or failed — incomplete snapshots can leave gaps in historical data
Common causes and resolutions:
CauseResolution
Messages in the DLQFix the root cause (schema mismatch, permission error, size limit), then snapshot affected tables to backfill missing records. See DLQ Recovery.
Incomplete or failed snapshotRe-trigger a snapshot for the affected tables. See Failed Snapshot Recovery.
Schema change without snapshotTrigger a snapshot to backfill historical rows with the new schema. See Snapshotting After Schema Changes.
Consumer group offset desynchronizedReset consumer group offsets to replay missed messages. Follow the offset reset procedure below.
Symptoms:
  • Pipeline latency is significantly higher than normal
  • Consumer lag is growing steadily
  • Data arrives at the destination with increasing delay
Diagnosis:
  1. Check the pipeline’s Lag and Latency on the Pipelines page
  2. Navigate to Consumer Groups and check per-partition lag to identify bottlenecks
  3. Review Logs for slow query warnings or timeout messages
  4. Check if a snapshot is currently running — snapshots increase load and may temporarily degrade performance
Common causes and resolutions:
CauseResolution
Active snapshot in progressThis is expected. Wait for the snapshot to complete. Lag and latency will normalize after backfilling finishes.
Insufficient consumer parallelismIncrease the Tasks setting on the destination connector to add more parallel consumers. See Consumer Groups - Performance Tuning.
Destination write bottleneckCheck destination database performance. Consider scaling destination resources or optimizing table indexes.
High source database loadSchedule snapshots during off-peak hours. Review source database query performance.
Partition count too lowIncrease topic partition count to enable higher parallelism. See Topics.
Increased latency and lag may be expected when snapshots are running. Backfilling produces higher load than normal CDC streaming, but the load is temporary.
Symptoms:
  • Pipeline shows Broken status with ChangeStreamFatalError in logs
  • Connector cannot resume from its last position
  • Error 280 appears in connector logs
Diagnosis:
  1. Check the Logs page for ChangeStreamFatalError messages
  2. Verify the MongoDB oplog retention period — if the connector was offline longer than the oplog retention, the resume token has expired
Resolution:
  1. Contact Streamkap support for an offset reset
  2. After the offset reset, trigger a snapshot to backfill any data missed during the outage
  3. Increase oplog retention to 48 hours or more to prevent recurrence:
    db.adminCommand({ replSetResizeOplog: 1, minRetentionHours: 48 })
    
Related: Error Reference — MongoDB ChangeStreamFatalError | MongoDB Source FAQ
Symptoms:
  • Pipeline shows Broken status with schema-related errors
  • Error message: “Schema isn’t known to this connector”
  • Typically affects databases with 1000+ tables
Diagnosis:
  1. Check the Logs page for schema-related error messages
  2. Verify the number of tables in the source database — this issue is more common with large database instances
Resolution:
  1. Contact Streamkap support for schema history recovery
  2. After recovery, trigger a snapshot of affected tables to ensure data consistency
  3. Consider enabling Capture Only Captured Tables DDL in the source’s Advanced settings to prevent recurrence
Related: Error Reference — MySQL Schema Error | Schema History Optimization
Symptoms:
  • Pipeline was working before a source database restart, failover, or maintenance window
  • Pipeline shows Broken status or is running but no new data is arriving
  • Logs show connection errors or “connection refused” messages
Diagnosis:
  1. Confirm that the source database is back online and accepting connections
  2. Verify that the database user credentials and network configuration have not changed
  3. Check Logs for connection error messages from the source connector
  4. For PostgreSQL sources, verify that the replication slot still exists and has not been dropped during maintenance
  5. For MySQL sources, verify that binary logging is still enabled and the binlog has not been purged past the connector’s position
Resolution:
  1. Verify source database connectivity and credentials
  2. Resume the source connector from the pipeline actions menu
  3. If the source connector does not recover:
    • Stop the source connector
    • Wait for the source database to be fully available
    • Resume the source connector
  4. If the connector reports that its position (replication slot, binlog position) is no longer valid:
    • The source connector may need to be reconfigured
    • Trigger a snapshot to re-establish the data baseline
    • Contact Streamkap support if the issue persists
For PostgreSQL sources, if the replication slot was dropped during maintenance, the connector cannot resume from its previous position. A snapshot will be required to re-establish data consistency.

Step-by-Step Recovery Procedures

How to restart a pipeline connector

1

Navigate to the Pipelines page

Go to the Pipelines page from the sidebar and locate the affected pipeline.
2

Open the actions menu

Click the actions menu (three dots) on the pipeline row.
3

Stop the connector

Select Stop source or Stop destination depending on which connector needs to be restarted.
4

Wait briefly

Wait approximately 10-30 seconds for the connector to fully stop. You can verify by checking the connector status on the pipeline detail page.
5

Resume the connector

Open the actions menu again and select Resume source or Resume destination.
6

Verify recovery

Monitor the pipeline’s Status, Lag, and Latency metrics to confirm the connector has recovered. Check the Logs page for any new error messages.
Is restart different from stop + resume? In Streamkap, restarting a connector is effectively a stop followed by a resume. Both operations preserve the connector’s last committed offset position, so no data is lost. The distinction matters primarily when you need to make changes (fix permissions, update credentials, wait for infrastructure) between stopping and resuming.

How to reset consumer group offsets

1

Stop the destination connector

From the pipeline’s row actions menu, select Stop destination. All consumers in the group must be stopped before offsets can be reset.
2

Navigate to the consumer group

Go to the Consumer Groups page and find the consumer group associated with your destination connector.
3

Select partitions to reset

In the Topic Partitions table, check the boxes for the partitions you want to reset. You can select individual partitions or all partitions for a topic.
4

Click Reset Offsets

Click the Reset Offsets button. A dialog will appear with reset options.
5

Choose a reset strategy

Select the appropriate strategy:
  • Earliest — replay all available messages from the beginning of the retention window
  • Latest — skip to the end and only process new messages going forward
  • Specific Timestamp — reset to the first offset after a given timestamp
  • Specific Offset — set a precise offset position
6

Apply the reset

Review your selections and click Apply.
7

Handle Snowflake destinations (if applicable)

For Snowflake destinations in append mode, you must also reset the Snowpipe Streaming channel offsets to -1. See Snowflake Offset Management for the required SQL commands.
8

Resume the destination connector

Go back to the pipeline’s actions menu and select Resume destination. The consumers will start processing from the new offset positions.
Resetting offsets to Earliest on large topics will cause re-processing of all retained messages, which may take considerable time and could result in duplicate data at the destination. Streamkap retains topic data based on your project’s retention policy. Only messages within the retention window can be replayed — check your project settings for the configured retention period.

How to trigger a snapshot

1

Navigate to the source connector

Go to the source connector detail page from the pipeline detail view or the Sources page.
2

Choose snapshot scope

Decide whether to snapshot all tables (source-level snapshot) or specific tables (table-level snapshot):
  • Source level: Use the source connector’s actions menu and select Snapshot
  • Table level: Find the specific topic in the Topics list and use its actions menu to select Snapshot
3

Select snapshot type

Choose between:
  • Full (Complete) — captures all rows from the selected tables
  • Filtered (Partial) — captures only rows matching a filter condition (useful for backfilling specific time ranges)
4

Confirm the snapshot

Type “snapshot” in the confirmation dialog to begin the operation.
5

Monitor progress

The source connector status will update to reflect the snapshot operation. Monitor progress on the source connector detail page and in the Topics list.
6

Verify completion

Once the snapshot completes, verify that the expected data is present at the destination. Check the pipeline’s Lag metric — it should decrease as the snapshot data is processed.
Snapshots run in parallel with ongoing CDC streaming. Your real-time data flow continues uninterrupted while historical data is being backfilled. Expect temporarily increased latency and lag during snapshot operations.
Recovery operation timing varies based on data volume, source database performance, and destination write capacity. Small table snapshots typically complete in minutes; large tables (100M+ rows) may take several hours. Offset resets take effect immediately, but reprocessing time depends on the volume of messages being replayed.