Skip to main content
Change Data Capture (CDC) pipeline performance depends on three stages: how fast the source produces change events, how topic partitioning distributes load across parallel consumers, and how quickly the destination writes data. Tuning each stage helps you achieve lower latency and higher throughput across your pipelines. This guide covers the key parameters, source-side optimizations, topic partitioning, destination-side optimizations, monitoring strategies, and common performance issues.

Key Tuning Parameters

The following parameters have the most impact on pipeline throughput. Adjust them incrementally and monitor the results before making further changes.
ParameterDescriptionDefaultWhere to Configure
TasksNumber of parallel tasks for the destination connector. More tasks enable concurrent writes across partitions.Varies by destinationDestination connector settings
Maximum poll recordsNumber of records fetched per poll cycle from Kafka. Higher values improve throughput for high-volume topics.Varies by destinationDestination Advanced settings
Topic partition countNumber of partitions per topic. More partitions enable greater parallelism for both producers and consumers.Configured at sourceTopics page, Settings tab
Poll interval (poll.interval.ms)Time between poll cycles. Lower values reduce latency but increase CPU usage. Higher values batch more records per poll.Varies by destinationDestination Advanced settings
Fetch minimum bytes (fetch.min.bytes)Minimum amount of data the consumer fetches per request from Kafka. Higher values allow the broker to accumulate more data before responding, improving throughput at the cost of latency.1 byteDestination Advanced settings
Fetch maximum wait (fetch.max.wait.ms)Maximum time the broker waits to accumulate fetch.min.bytes before responding. Lower values reduce latency; higher values allow more batching.500 msDestination Advanced settings
Make performance tuning changes one at a time and monitor the impact for a meaningful period before making additional adjustments. Changing multiple parameters simultaneously makes it difficult to attribute improvements or regressions to a specific change.
Setting Maximum poll records to very high values (e.g., 100,000+) can increase memory consumption on the connector. If you observe out-of-memory errors or connector instability after increasing this value, reduce it incrementally until stable.

Source-Side Optimization

Source connectors read change events from your database’s transaction log (Write-Ahead Log / WAL, binlog, redo log, or change stream). The following factors influence how efficiently the source produces events.

Snapshot Performance

Initial snapshots and backfills are typically the most resource-intensive operations for a source connector. The duration and load depend on several factors:
FactorImpactGuidance
Table size (row count)Larger tables take proportionally longerFor tables with hundreds of millions of rows, expect snapshots to run for hours
Row widthWide rows with large text/blob columns increase per-chunk processing timeNarrow tables snapshot faster than tables with many columns or large payloads
Source database loadHigh concurrent query load slows snapshot readsSchedule snapshots during off-peak hours when possible
Index availabilitySnapshots read data in primary key orderEnsure primary keys are well-indexed on source tables
Number of tablesTables are snapshotted sequentiallyPrioritize critical tables using table-level snapshots
For very large tables, consider using Filtered (Partial) snapshots to backfill data in manageable ranges rather than snapshotting the entire table at once. This reduces load on your source database and gives you more control over the process. See Snapshots & Backfilling for details.

Heartbeat Configuration

For databases with low traffic or intermittent activity, configure heartbeats to keep the connector’s offset position fresh. Without heartbeats, connectors tracking quiet tables may lose their place in the transaction log when the log rotates or expires. Streamkap provides two layers of heartbeat protection:
  1. Connector heartbeats (enabled by default) — The connector periodically emits heartbeat messages to an internal topic, even when no data changes are detected.
  2. Source database heartbeats (recommended for all deployments) — Regular updates to a dedicated heartbeat table in the source database simulate activity and maintain log progress.
See Heartbeat Configuration for setup instructions specific to your source type.

WAL / Binlog Retention

Ensure your source database retains its transaction log long enough for the connector to read it. If the log is rotated or purged before the connector processes it, the connector will lose its position and require a new snapshot.
  • PostgreSQL: Set wal_level = logical and configure a sufficient max_slot_wal_keep_size or replication slot retention.
  • MySQL / MariaDB: Configure binlog_expire_logs_seconds (or expire_logs_days) to retain binlogs for at least 3-7 days.
  • Oracle: Configure redo log and archive log retention to cover at least several days of changes.
If your source connector falls behind and the transaction log is purged, you will need to trigger a new snapshot to recover. Monitor connector lag and source log retention to prevent this situation.

Topic Partitioning

Topic partitions control how data is distributed across parallel consumers. Increasing the partition count is the primary way to scale throughput for both streaming changes and snapshots/backfills.

How Partitions Affect Throughput

Each topic partition can be consumed by one task at a time. The maximum effective parallelism for a destination connector equals the number of partitions in the topic.
  • 1 partition = 1 task can consume, all data processed serially
  • 5 partitions = up to 5 tasks consuming in parallel
  • Tasks beyond the partition count remain idle
When you increase partitions, you must also increase destination Tasks to match — otherwise the additional partitions will not be consumed in parallel.

When to Increase Partitions

  • High consumer lag: The destination cannot keep up with a single partition’s throughput
  • Scaling destination writes: Enable multiple tasks to write in parallel
  • Faster snapshots and backfills: More partitions allow the destination to ingest snapshot data in parallel, significantly reducing backfill duration for large tables

Partition Increase Procedure

  1. Increase topic partitions on the Topics page Settings tab (e.g., to 5, 8, 16, or 32 depending on throughput needs). Follow the safe partition increase procedure.
  2. Increase destination tasks to match the new partition count in the destination connector settings.
  3. Monitor consumer lag on the Consumer Groups page to confirm the additional parallelism is effective.
Increasing partitions alone is not sufficient — you must also increase the destination Tasks to match. Without matching tasks, the additional partitions will not be consumed in parallel.
For snapshots specifically, partition and task increases should be made before triggering the snapshot. Snapshot read speed is also bounded by the source database — if your source is under heavy load or has slow disk I/O, increasing partitions and tasks on the Streamkap side will not help beyond the source’s throughput limit.
See Topics — Partition Management for the full safe procedure and considerations.

Destination-Side Optimization

Destination connectors consume messages from Kafka and write them to your target system. The primary levers for improving destination throughput are parallelism (tasks), polling efficiency (maximum poll records), and partition count. All destination-side tuning applies to both streaming CDC changes and snapshot/backfill ingestion.
Applies to: Snowflake, BigQuery, Redshift, DatabricksData warehouses are optimized for batch operations. Key tuning considerations:
  • Tasks: Increase the number of tasks to enable parallel writes across partitions. Each task handles one or more partitions independently.
  • Maximum poll records: Increase to 25000, 50000, or 80000 depending on record sizes. Larger poll batches reduce the overhead of frequent, small writes.
  • Topic partitions: Increase partition count to at least 5 for high-throughput topics to enable greater parallelism.
Warehouse-specific considerations:
  • Snowflake (Append mode): Uses Snowpipe Streaming for ingestion. Warehouse compute is only needed for Dynamic Tables, QA, and tasks. Set AUTO_SUSPEND to 60 seconds or higher for active CDC pipelines to avoid frequent suspend/resume cycles.
  • Snowflake (Upsert mode): Requires a running warehouse for periodic MERGE INTO operations. Size the warehouse to match your ingestion throughput.
  • BigQuery: Ensure the service account has sufficient quota for streaming inserts or load jobs in your target dataset region.
  • Redshift: Match task count to your cluster’s write capacity. Monitor WLM queue depth and query throughput.
  • Databricks: Data is staged as Parquet files and loaded via COPY. Ensure the tmp directory on DBFS has sufficient space and the cluster is sized for your ingestion rate.

Monitoring Performance

Effective performance tuning requires ongoing monitoring. Use the following Streamkap features to track pipeline health and identify bottlenecks.

Consumer Group Lag

Consumer lag is the most important metric for identifying throughput issues. It represents how far behind the destination connector is from the latest message in a topic.
  • Zero lag: The destination is fully caught up with the source.
  • Steady, low lag: Normal operating state for active pipelines.
  • Increasing lag: The destination cannot keep up with the source’s event production rate. This is the primary signal that tuning is needed.
Monitor consumer lag on the Consumer Groups page. Check the Total Lag metric and drill into partition-level lag to identify uneven distribution.

Topic Throughput

The Topics page displays volume, events, and error metrics for each topic. Use the metrics chart to track throughput over time and identify patterns:
  • Volume spikes: Correlate with snapshot activity or source database bulk operations.
  • Error spikes: Investigate DLQ topics for failed messages.
  • Latency increases: May indicate destination bottlenecks or Kafka broker pressure.

Pipeline Metrics

The Pipelines page provides pipeline-level latency and lag metrics:
  • Latency: Time for data to flow from source through the pipeline to the destination. Values under 1 second indicate healthy performance.
  • Lag: Number of records waiting to be processed. Zero means the pipeline is fully caught up.
Increased latency and lag are expected when snapshots are running. Backfilling produces a much higher load than normal CDC streaming, but the load is temporary while backfills are in progress.

Common Performance Issues

Symptoms: Consumer lag is consistently increasing over time and not reducing.Possible causes and remediation:
  1. Destination bottleneck: The destination cannot write fast enough to keep up with the source.
    • Increase the number of Tasks on the destination connector.
    • Increase Maximum poll records in the destination’s Advanced settings (try 25000, 50000, or 80000).
    • Increase topic partition count to at least 5 to enable greater parallelism.
  2. Under-provisioned destination: The target system lacks sufficient compute or I/O capacity.
    • For data warehouses, increase warehouse/cluster size.
    • For JDBC databases, check CPU, memory, and connection pool utilization.
    • For ClickHouse, verify the service tier provides adequate resources for inserts and merges.
  3. Insufficient parallelism: The number of tasks is lower than the number of partitions, or there is only one partition.
    • Increase partition count, then increase tasks to match.
The maximum useful number of tasks equals the number of topic partitions. Additional tasks beyond the partition count will remain idle.
Symptoms: Snapshot is taking much longer than expected to complete.Possible causes and remediation:
  1. Large tables: Tables with hundreds of millions of rows will naturally take hours to snapshot. This is expected behavior.
  2. Source database under heavy load: Competing queries slow down snapshot reads.
    • Schedule snapshots during off-peak hours.
    • Use filtered (partial) snapshots to process smaller data ranges.
  3. Missing or fragmented indexes: Snapshots read data in primary key order. Poorly indexed tables are slower to read.
    • Ensure primary keys are well-indexed and consider running ANALYZE or REINDEX on the source database.
  4. Network latency: Cross-region connections between Streamkap and the source database add round-trip time per chunk.
See Snapshots & Backfilling for detailed guidance on planning and estimating snapshot duration.
Symptoms: Pipeline shows errors or broken status. Logs indicate write timeouts or connection failures to the destination.Possible causes and remediation:
  1. Destination overloaded: The target system cannot handle the write volume.
    • Reduce the number of tasks temporarily to lower concurrent write pressure.
    • For data warehouses, increase warehouse/cluster capacity.
  2. Network issues: Transient connectivity problems between Streamkap and the destination.
    • Check firewall rules and network policies.
    • Verify the destination is accessible from Streamkap IP addresses.
  3. Large records: Individual records that are very large (e.g., containing large text or blob fields) may exceed timeout thresholds.
    • Consider excluding large columns from replication if they are not needed at the destination.
Symptoms: Some partitions have high lag while others are at zero. Overall throughput is limited by the slowest partition.Possible causes and remediation:
  1. Partition count lower than task count: Some tasks are idle while others are overloaded.
  2. Hot partitions: Certain keys produce disproportionate traffic, creating uneven load across partitions.
    • Monitor partition-level lag on the Consumer Groups page to identify imbalanced partitions.
  3. Recent partition increase: After increasing partitions, new data goes to new partitions but existing lag remains on old partitions until fully processed.
Increasing partitions only affects new data. Existing lag on old partitions will persist until those messages are fully consumed.

Tuning Workflow

Follow this workflow when optimizing pipeline performance:
1

Identify the bottleneck

Check Consumer Groups for lag, Topics for throughput metrics, and Pipelines for latency. Determine whether the bottleneck is on the source side, Kafka layer, or destination side.
2

Increase maximum poll records

In the destination connector’s Advanced settings, increase Maximum poll records to 25000, 50000, or 80000 depending on your record sizes. This is the lowest-risk change and often provides immediate improvement.
3

Increase topic partitions

On the Topics page, increase the partition count to at least 5 for high-throughput topics. Follow the safe partition increase procedure to avoid data consistency issues.
4

Increase tasks

In the destination connector settings, increase Tasks to match the partition count. More tasks enable more parallel consumers and writers.
5

Monitor and iterate

After each change, monitor consumer lag, latency, and destination health for a meaningful period before making further adjustments.
6

Contact Streamkap support

If you have exhausted the tuning options above and performance is still not meeting your requirements, contact Streamkap support for advanced optimization assistance. Include your pipeline ID, current settings, and the metrics you are observing.

Sizing by Use Case

Different workloads have different performance characteristics. Use the recommendations below as starting points and adjust based on monitoring.
Goal: Maximize the number of records processed per second.
  • Increase Maximum poll records to 50000 or 80000
  • Increase Topic partition count to 5 or higher
  • Increase Tasks to match the partition count
  • Set fetch.min.bytes to 100000 (100 KB) or higher to allow the broker to batch more data per fetch
  • Set fetch.max.wait.ms to 1000 ms or higher to give the broker time to accumulate larger batches
  • Use a destination tier with sufficient write capacity (larger warehouse, higher IOPS database, etc.)