Performance Tuning - Streamkap Docs

Change Data Capture (CDC) pipeline performance depends on three stages: how fast the source produces change events, how topic partitioning distributes load across parallel consumers, and how quickly the destination writes data. Tuning each stage helps you achieve lower latency and higher throughput across your pipelines. This guide covers the key parameters, source-side optimizations, topic partitioning, destination-side optimizations, monitoring strategies, and common performance issues.

Key Tuning Parameters

The following parameters have the most impact on pipeline throughput. Adjust them incrementally and monitor the results before making further changes.

Parameter	Description	Default	Where to Configure
Tasks	Number of parallel tasks for the destination connector. More tasks enable concurrent writes across partitions.	Varies by destination	Destination connector settings
Maximum poll records	Number of records fetched per poll cycle from Kafka. Higher values improve throughput for high-volume topics.	Varies by destination	Destination Advanced settings
Topic partition count	Number of partitions per topic. More partitions enable greater parallelism for both producers and consumers.	Configured at source	Topics page, Settings tab
Poll interval (`poll.interval.ms`)	Time between poll cycles. Lower values reduce latency but increase CPU usage. Higher values batch more records per poll.	Varies by destination	Destination Advanced settings
Fetch minimum bytes (`fetch.min.bytes`)	Minimum amount of data the consumer fetches per request from Kafka. Higher values allow the broker to accumulate more data before responding, improving throughput at the cost of latency.	`1` byte	Destination Advanced settings
Fetch maximum wait (`fetch.max.wait.ms`)	Maximum time the broker waits to accumulate `fetch.min.bytes` before responding. Lower values reduce latency; higher values allow more batching.	`500` ms	Destination Advanced settings

Make performance tuning changes one at a time and monitor the impact for a meaningful period before making additional adjustments. Changing multiple parameters simultaneously makes it difficult to attribute improvements or regressions to a specific change.

Setting Maximum poll records to very high values (e.g., 100,000+) can increase memory consumption on the connector. If you observe out-of-memory errors or connector instability after increasing this value, reduce it incrementally until stable.

Source-Side Optimization

Source connectors read change events from your database’s transaction log (Write-Ahead Log / WAL, binlog, redo log, or change stream). The following factors influence how efficiently the source produces events.

Snapshot Performance

Initial snapshots and backfills are typically the most resource-intensive operations for a source connector. The duration and load depend on several factors:

Factor	Impact	Guidance
Table size (row count)	Larger tables take proportionally longer	For tables with hundreds of millions of rows, expect snapshots to run for hours
Row width	Wide rows with large text/blob columns increase per-chunk processing time	Narrow tables snapshot faster than tables with many columns or large payloads
Source database load	High concurrent query load slows snapshot reads	Schedule snapshots during off-peak hours when possible
Index availability	Snapshots read data in primary key order	Ensure primary keys are well-indexed on source tables
Number of tables	Tables are snapshotted sequentially	Prioritize critical tables using table-level snapshots

For very large tables, consider using Filtered (Partial) snapshots to backfill data in manageable ranges rather than snapshotting the entire table at once. This reduces load on your source database and gives you more control over the process. See Snapshots & Backfilling for details.

Heartbeat Configuration

For databases with low traffic or intermittent activity, configure heartbeats to keep the connector’s offset position fresh. Without heartbeats, connectors tracking quiet tables may lose their place in the transaction log when the log rotates or expires. Streamkap provides two layers of heartbeat protection:

Connector heartbeats (enabled by default) — The connector periodically emits heartbeat messages to an internal topic, even when no data changes are detected.
Source database heartbeats (recommended for all deployments) — Regular updates to a dedicated heartbeat table in the source database simulate activity and maintain log progress.

See Heartbeat Configuration for setup instructions specific to your source type.

WAL / Binlog Retention

Ensure your source database retains its transaction log long enough for the connector to read it. If the log is rotated or purged before the connector processes it, the connector will lose its position and require a new snapshot.

PostgreSQL: Set wal_level = logical and configure a sufficient max_slot_wal_keep_size or replication slot retention.
MySQL / MariaDB: Configure binlog_expire_logs_seconds (or expire_logs_days) to retain binlogs for at least 3-7 days.
Oracle: Configure redo log and archive log retention to cover at least several days of changes.

If your source connector falls behind and the transaction log is purged, you will need to trigger a new snapshot to recover. Monitor connector lag and source log retention to prevent this situation.

Topic Partitioning

Topic partitions control how data is distributed across parallel consumers. Increasing the partition count is the primary way to scale throughput for both streaming changes and snapshots/backfills.

How Partitions Affect Throughput

Each topic partition can be consumed by one task at a time. The maximum effective parallelism for a destination connector equals the number of partitions in the topic.

1 partition = 1 task can consume, all data processed serially
5 partitions = up to 5 tasks consuming in parallel
Tasks beyond the partition count remain idle

When you increase partitions, you must also increase destination Tasks to match — otherwise the additional partitions will not be consumed in parallel.

When to Increase Partitions

High consumer lag: The destination cannot keep up with a single partition’s throughput
Scaling destination writes: Enable multiple tasks to write in parallel
Faster snapshots and backfills: More partitions allow the destination to ingest snapshot data in parallel, significantly reducing backfill duration for large tables

Partition Increase Procedure

Increase topic partitions on the Topics page Settings tab (e.g., to 5, 8, 16, or 32 depending on throughput needs). Follow the safe partition increase procedure.
Increase destination tasks to match the new partition count in the destination connector settings.
Monitor consumer lag on the Consumer Groups page to confirm the additional parallelism is effective.

Increasing partitions alone is not sufficient — you must also increase the destination Tasks to match. Without matching tasks, the additional partitions will not be consumed in parallel.

For snapshots specifically, partition and task increases should be made before triggering the snapshot. Snapshot read speed is also bounded by the source database — if your source is under heavy load or has slow disk I/O, increasing partitions and tasks on the Streamkap side will not help beyond the source’s throughput limit.

See Topics — Partition Management for the full safe procedure and considerations.

Destination-Side Optimization

Destination connectors consume messages from Kafka and write them to your target system. The primary levers for improving destination throughput are parallelism (tasks), polling efficiency (maximum poll records), and partition count. All destination-side tuning applies to both streaming CDC changes and snapshot/backfill ingestion.

Data Warehouses
JDBC Databases
Object Storage
ClickHouse
Iceberg

Applies to: Snowflake, BigQuery, Redshift, DatabricksData warehouses are optimized for batch operations. Key tuning considerations:

Tasks: Increase the number of tasks to enable parallel writes across partitions. Each task handles one or more partitions independently.
Maximum poll records: Increase to 25000, 50000, or 80000 depending on record sizes. Larger poll batches reduce the overhead of frequent, small writes.
Topic partitions: Increase partition count to at least 5 for high-throughput topics to enable greater parallelism.

Warehouse-specific considerations:

Snowflake (Append mode): Uses Snowpipe Streaming for ingestion. Warehouse compute is only needed for Dynamic Tables, QA, and tasks. Set AUTO_SUSPEND to 60 seconds or higher for active CDC pipelines to avoid frequent suspend/resume cycles.
Snowflake (Upsert mode): Requires a running warehouse for periodic MERGE INTO operations. Size the warehouse to match your ingestion throughput.
BigQuery: Ensure the service account has sufficient quota for streaming inserts or load jobs in your target dataset region.
Redshift: Match task count to your cluster’s write capacity. Monitor WLM queue depth and query throughput.
Databricks: Data is staged as Parquet files and loaded via COPY. Ensure the tmp directory on DBFS has sufficient space and the cluster is sized for your ingestion rate.

Applies to: ClickHouse CloudClickHouse uses the ReplacingMergeTree engine for upsert mode, which deduplicates data during background merges rather than at write time. Key tuning considerations:

Tasks: Increase to enable parallel writes. ClickHouse handles concurrent inserts well.
Maximum poll records: Increase to 25000, 50000, or 80000 depending on record sizes. Larger batches improve insert throughput.
Topic partitions: Increase to at least 5 for high-throughput topics.

ClickHouse-specific considerations:

Background merges handle deduplication automatically, but queries may return duplicates before merges complete. Use the FINAL query modifier or set final = 1 on the querying role to ensure deduplicated results.
Monitor ClickHouse merge activity to ensure background merges keep up with insert volume.
For high-throughput workloads, ensure the ClickHouse service tier provides sufficient CPU and memory for both inserts and merges.

Monitoring Performance

Effective performance tuning requires ongoing monitoring. Use the following Streamkap features to track pipeline health and identify bottlenecks.

Consumer Group Lag

Consumer lag is the most important metric for identifying throughput issues. It represents how far behind the destination connector is from the latest message in a topic.

Zero lag: The destination is fully caught up with the source.
Steady, low lag: Normal operating state for active pipelines.
Increasing lag: The destination cannot keep up with the source’s event production rate. This is the primary signal that tuning is needed.

Monitor consumer lag on the Consumer Groups page. Check the Total Lag metric and drill into partition-level lag to identify uneven distribution.

Topic Throughput

The Topics page displays volume, events, and error metrics for each topic. Use the metrics chart to track throughput over time and identify patterns:

Volume spikes: Correlate with snapshot activity or source database bulk operations.
Error spikes: Investigate DLQ topics for failed messages.
Latency increases: May indicate destination bottlenecks or Kafka broker pressure.

Pipeline Metrics

The Pipelines page provides pipeline-level latency and lag metrics:

Latency: Time for data to flow from source through the pipeline to the destination. Values under 1 second indicate healthy performance.
Lag: Number of records waiting to be processed. Zero means the pipeline is fully caught up.

Increased latency and lag are expected when snapshots are running. Backfilling produces a much higher load than normal CDC streaming, but the load is temporary while backfills are in progress.

Common Performance Issues

High consumer lag

Symptoms: Consumer lag is consistently increasing over time and not reducing.Possible causes and remediation:

Destination bottleneck: The destination cannot write fast enough to keep up with the source.
- Increase the number of Tasks on the destination connector.
- Increase Maximum poll records in the destination’s Advanced settings (try 25000, 50000, or 80000).
- Increase topic partition count to at least 5 to enable greater parallelism.
Under-provisioned destination: The target system lacks sufficient compute or I/O capacity.
- For data warehouses, increase warehouse/cluster size.
- For JDBC databases, check CPU, memory, and connection pool utilization.
- For ClickHouse, verify the service tier provides adequate resources for inserts and merges.
Insufficient parallelism: The number of tasks is lower than the number of partitions, or there is only one partition.
- Increase partition count, then increase tasks to match.

The maximum useful number of tasks equals the number of topic partitions. Additional tasks beyond the partition count will remain idle.

Slow initial snapshot

Symptoms: Snapshot is taking much longer than expected to complete.Possible causes and remediation:

Large tables: Tables with hundreds of millions of rows will naturally take hours to snapshot. This is expected behavior.
Source database under heavy load: Competing queries slow down snapshot reads.
- Schedule snapshots during off-peak hours.
- Use filtered (partial) snapshots to process smaller data ranges.
Missing or fragmented indexes: Snapshots read data in primary key order. Poorly indexed tables are slower to read.
- Ensure primary keys are well-indexed and consider running ANALYZE or REINDEX on the source database.
Network latency: Cross-region connections between Streamkap and the source database add round-trip time per chunk.

See Snapshots & Backfilling for detailed guidance on planning and estimating snapshot duration.

Destination write timeouts

Symptoms: Pipeline shows errors or broken status. Logs indicate write timeouts or connection failures to the destination.Possible causes and remediation:

Destination overloaded: The target system cannot handle the write volume.
- Reduce the number of tasks temporarily to lower concurrent write pressure.
- For data warehouses, increase warehouse/cluster capacity.
Network issues: Transient connectivity problems between Streamkap and the destination.
- Check firewall rules and network policies.
- Verify the destination is accessible from Streamkap IP addresses.
Large records: Individual records that are very large (e.g., containing large text or blob fields) may exceed timeout thresholds.
- Consider excluding large columns from replication if they are not needed at the destination.

Uneven partition distribution

Symptoms: Some partitions have high lag while others are at zero. Overall throughput is limited by the slowest partition.Possible causes and remediation:

Partition count lower than task count: Some tasks are idle while others are overloaded.
- Increase the partition count to match or exceed the task count. Follow the safe partition increase procedure.
Hot partitions: Certain keys produce disproportionate traffic, creating uneven load across partitions.
- Monitor partition-level lag on the Consumer Groups page to identify imbalanced partitions.
Recent partition increase: After increasing partitions, new data goes to new partitions but existing lag remains on old partitions until fully processed.

Increasing partitions only affects new data. Existing lag on old partitions will persist until those messages are fully consumed.

Tuning Workflow

Follow this workflow when optimizing pipeline performance:

Identify the bottleneck

Check Consumer Groups for lag, Topics for throughput metrics, and Pipelines for latency. Determine whether the bottleneck is on the source side, Kafka layer, or destination side.

Increase maximum poll records

In the destination connector’s Advanced settings, increase Maximum poll records to 25000, 50000, or 80000 depending on your record sizes. This is the lowest-risk change and often provides immediate improvement.

Increase topic partitions

On the Topics page, increase the partition count to at least 5 for high-throughput topics. Follow the safe partition increase procedure to avoid data consistency issues.

Increase tasks

In the destination connector settings, increase Tasks to match the partition count. More tasks enable more parallel consumers and writers.

Monitor and iterate

After each change, monitor consumer lag, latency, and destination health for a meaningful period before making further adjustments.

Contact Streamkap support

If you have exhausted the tuning options above and performance is still not meeting your requirements, contact Streamkap support for advanced optimization assistance. Include your pipeline ID, current settings, and the metrics you are observing.

Sizing by Use Case

Different workloads have different performance characteristics. Use the recommendations below as starting points and adjust based on monitoring.

High Throughput
Low Latency
Large Records
Small / Frequent Records

Goal: Maximize the number of records processed per second.

Increase Maximum poll records to 50000 or 80000
Increase Topic partition count to 5 or higher
Increase Tasks to match the partition count
Set fetch.min.bytes to 100000 (100 KB) or higher to allow the broker to batch more data per fetch
Set fetch.max.wait.ms to 1000 ms or higher to give the broker time to accumulate larger batches
Use a destination tier with sufficient write capacity (larger warehouse, higher IOPS database, etc.)

Goal: Minimize the time between a source change and its arrival at the destination.

Keep Maximum poll records moderate (5000—10000) to avoid waiting for large batches to fill
Set fetch.min.bytes to 1 (the default) so the broker responds immediately with any available data
Set fetch.max.wait.ms to 100—250 ms to reduce broker-side wait time
Set Poll interval (poll.interval.ms) to a low value to poll more frequently
Use enough Tasks and partitions to avoid queuing, but do not over-provision

Goal: Handle records with large payloads (e.g., text columns, blobs, wide rows) without timeouts or memory issues.

Keep Maximum poll records lower (5000—10000) to reduce per-batch memory consumption
Monitor connector memory usage and reduce poll records further if you observe instability
Consider excluding very large columns from replication (using DropFields) if they are not needed at the destination
Ensure Topic partition count is sufficient to spread load across multiple tasks

Goal: Efficiently process a high volume of small records (e.g., event logs, click streams).

Increase Maximum poll records to 50000 or 80000 to aggregate many small records per batch
Set fetch.min.bytes to 50000 (50 KB) or higher so the broker accumulates multiple small messages before responding
Set fetch.max.wait.ms to 500—1000 ms to balance batch size with acceptable latency
Increase Topic partition count and Tasks to handle the high event rate in parallel

Topics - Manage Kafka topics, partitions, and view throughput metrics
Consumer Groups - Monitor consumer lag and manage offset positions
Pipelines - Monitor pipeline health, latency, and lag
Snapshots & Backfilling - Plan and optimize snapshot operations

​Key Tuning Parameters

​Source-Side Optimization

​Snapshot Performance

​Heartbeat Configuration

​WAL / Binlog Retention

​Topic Partitioning

​How Partitions Affect Throughput

​When to Increase Partitions

​Partition Increase Procedure

​Destination-Side Optimization

​Monitoring Performance

​Consumer Group Lag

​Topic Throughput

​Pipeline Metrics

​Common Performance Issues

​Tuning Workflow

​Sizing by Use Case

​Related Documentation

Key Tuning Parameters

Source-Side Optimization

Snapshot Performance

Heartbeat Configuration

WAL / Binlog Retention

Topic Partitioning

How Partitions Affect Throughput

When to Increase Partitions

Partition Increase Procedure

Destination-Side Optimization

Monitoring Performance

Consumer Group Lag

Topic Throughput

Pipeline Metrics

Common Performance Issues

Tuning Workflow

Sizing by Use Case

Related Documentation