Key Tuning Parameters
The following parameters have the most impact on pipeline throughput. Adjust them incrementally and monitor the results before making further changes.| Parameter | Description | Default | Where to Configure |
|---|---|---|---|
| Tasks | Number of parallel tasks for the destination connector. More tasks enable concurrent writes across partitions. | Varies by destination | Destination connector settings |
| Maximum poll records | Number of records fetched per poll cycle from Kafka. Higher values improve throughput for high-volume topics. | Varies by destination | Destination Advanced settings |
| Topic partition count | Number of partitions per topic. More partitions enable greater parallelism for both producers and consumers. | Configured at source | Topics page, Settings tab |
Poll interval (poll.interval.ms) | Time between poll cycles. Lower values reduce latency but increase CPU usage. Higher values batch more records per poll. | Varies by destination | Destination Advanced settings |
Fetch minimum bytes (fetch.min.bytes) | Minimum amount of data the consumer fetches per request from Kafka. Higher values allow the broker to accumulate more data before responding, improving throughput at the cost of latency. | 1 byte | Destination Advanced settings |
Fetch maximum wait (fetch.max.wait.ms) | Maximum time the broker waits to accumulate fetch.min.bytes before responding. Lower values reduce latency; higher values allow more batching. | 500 ms | Destination Advanced settings |
Make performance tuning changes one at a time and monitor the impact for a meaningful period before making additional adjustments. Changing multiple parameters simultaneously makes it difficult to attribute improvements or regressions to a specific change.
Source-Side Optimization
Source connectors read change events from your database’s transaction log (Write-Ahead Log / WAL, binlog, redo log, or change stream). The following factors influence how efficiently the source produces events.Snapshot Performance
Initial snapshots and backfills are typically the most resource-intensive operations for a source connector. The duration and load depend on several factors:| Factor | Impact | Guidance |
|---|---|---|
| Table size (row count) | Larger tables take proportionally longer | For tables with hundreds of millions of rows, expect snapshots to run for hours |
| Row width | Wide rows with large text/blob columns increase per-chunk processing time | Narrow tables snapshot faster than tables with many columns or large payloads |
| Source database load | High concurrent query load slows snapshot reads | Schedule snapshots during off-peak hours when possible |
| Index availability | Snapshots read data in primary key order | Ensure primary keys are well-indexed on source tables |
| Number of tables | Tables are snapshotted sequentially | Prioritize critical tables using table-level snapshots |
Heartbeat Configuration
For databases with low traffic or intermittent activity, configure heartbeats to keep the connector’s offset position fresh. Without heartbeats, connectors tracking quiet tables may lose their place in the transaction log when the log rotates or expires. Streamkap provides two layers of heartbeat protection:- Connector heartbeats (enabled by default) — The connector periodically emits heartbeat messages to an internal topic, even when no data changes are detected.
- Source database heartbeats (recommended for all deployments) — Regular updates to a dedicated heartbeat table in the source database simulate activity and maintain log progress.
WAL / Binlog Retention
Ensure your source database retains its transaction log long enough for the connector to read it. If the log is rotated or purged before the connector processes it, the connector will lose its position and require a new snapshot.- PostgreSQL: Set
wal_level = logicaland configure a sufficientmax_slot_wal_keep_sizeor replication slot retention. - MySQL / MariaDB: Configure
binlog_expire_logs_seconds(orexpire_logs_days) to retain binlogs for at least 3-7 days. - Oracle: Configure redo log and archive log retention to cover at least several days of changes.
Topic Partitioning
Topic partitions control how data is distributed across parallel consumers. Increasing the partition count is the primary way to scale throughput for both streaming changes and snapshots/backfills.How Partitions Affect Throughput
Each topic partition can be consumed by one task at a time. The maximum effective parallelism for a destination connector equals the number of partitions in the topic.- 1 partition = 1 task can consume, all data processed serially
- 5 partitions = up to 5 tasks consuming in parallel
- Tasks beyond the partition count remain idle
When to Increase Partitions
- High consumer lag: The destination cannot keep up with a single partition’s throughput
- Scaling destination writes: Enable multiple tasks to write in parallel
- Faster snapshots and backfills: More partitions allow the destination to ingest snapshot data in parallel, significantly reducing backfill duration for large tables
Partition Increase Procedure
- Increase topic partitions on the Topics page Settings tab (e.g., to
5,8,16, or32depending on throughput needs). Follow the safe partition increase procedure. - Increase destination tasks to match the new partition count in the destination connector settings.
- Monitor consumer lag on the Consumer Groups page to confirm the additional parallelism is effective.
For snapshots specifically, partition and task increases should be made before triggering the snapshot. Snapshot read speed is also bounded by the source database — if your source is under heavy load or has slow disk I/O, increasing partitions and tasks on the Streamkap side will not help beyond the source’s throughput limit.
Destination-Side Optimization
Destination connectors consume messages from Kafka and write them to your target system. The primary levers for improving destination throughput are parallelism (tasks), polling efficiency (maximum poll records), and partition count. All destination-side tuning applies to both streaming CDC changes and snapshot/backfill ingestion.- Data Warehouses
- JDBC Databases
- Object Storage
- ClickHouse
- Iceberg
Applies to: Snowflake, BigQuery, Redshift, DatabricksData warehouses are optimized for batch operations. Key tuning considerations:
- Tasks: Increase the number of tasks to enable parallel writes across partitions. Each task handles one or more partitions independently.
- Maximum poll records: Increase to
25000,50000, or80000depending on record sizes. Larger poll batches reduce the overhead of frequent, small writes. - Topic partitions: Increase partition count to at least
5for high-throughput topics to enable greater parallelism.
- Snowflake (Append mode): Uses Snowpipe Streaming for ingestion. Warehouse compute is only needed for Dynamic Tables, QA, and tasks. Set
AUTO_SUSPENDto 60 seconds or higher for active CDC pipelines to avoid frequent suspend/resume cycles. - Snowflake (Upsert mode): Requires a running warehouse for periodic
MERGE INTOoperations. Size the warehouse to match your ingestion throughput. - BigQuery: Ensure the service account has sufficient quota for streaming inserts or load jobs in your target dataset region.
- Redshift: Match task count to your cluster’s write capacity. Monitor WLM queue depth and query throughput.
- Databricks: Data is staged as Parquet files and loaded via
COPY. Ensure thetmpdirectory on DBFS has sufficient space and the cluster is sized for your ingestion rate.
Monitoring Performance
Effective performance tuning requires ongoing monitoring. Use the following Streamkap features to track pipeline health and identify bottlenecks.Consumer Group Lag
Consumer lag is the most important metric for identifying throughput issues. It represents how far behind the destination connector is from the latest message in a topic.- Zero lag: The destination is fully caught up with the source.
- Steady, low lag: Normal operating state for active pipelines.
- Increasing lag: The destination cannot keep up with the source’s event production rate. This is the primary signal that tuning is needed.
Topic Throughput
The Topics page displays volume, events, and error metrics for each topic. Use the metrics chart to track throughput over time and identify patterns:- Volume spikes: Correlate with snapshot activity or source database bulk operations.
- Error spikes: Investigate DLQ topics for failed messages.
- Latency increases: May indicate destination bottlenecks or Kafka broker pressure.
Pipeline Metrics
The Pipelines page provides pipeline-level latency and lag metrics:- Latency: Time for data to flow from source through the pipeline to the destination. Values under 1 second indicate healthy performance.
- Lag: Number of records waiting to be processed. Zero means the pipeline is fully caught up.
Common Performance Issues
High consumer lag
High consumer lag
Symptoms: Consumer lag is consistently increasing over time and not reducing.Possible causes and remediation:
- Destination bottleneck: The destination cannot write fast enough to keep up with the source.
- Increase the number of Tasks on the destination connector.
- Increase Maximum poll records in the destination’s Advanced settings (try
25000,50000, or80000). - Increase topic partition count to at least
5to enable greater parallelism.
- Under-provisioned destination: The target system lacks sufficient compute or I/O capacity.
- For data warehouses, increase warehouse/cluster size.
- For JDBC databases, check CPU, memory, and connection pool utilization.
- For ClickHouse, verify the service tier provides adequate resources for inserts and merges.
- Insufficient parallelism: The number of tasks is lower than the number of partitions, or there is only one partition.
- Increase partition count, then increase tasks to match.
The maximum useful number of tasks equals the number of topic partitions. Additional tasks beyond the partition count will remain idle.
Slow initial snapshot
Slow initial snapshot
Symptoms: Snapshot is taking much longer than expected to complete.Possible causes and remediation:
- Large tables: Tables with hundreds of millions of rows will naturally take hours to snapshot. This is expected behavior.
- Source database under heavy load: Competing queries slow down snapshot reads.
- Schedule snapshots during off-peak hours.
- Use filtered (partial) snapshots to process smaller data ranges.
- Missing or fragmented indexes: Snapshots read data in primary key order. Poorly indexed tables are slower to read.
- Ensure primary keys are well-indexed and consider running
ANALYZEorREINDEXon the source database.
- Ensure primary keys are well-indexed and consider running
- Network latency: Cross-region connections between Streamkap and the source database add round-trip time per chunk.
Destination write timeouts
Destination write timeouts
Symptoms: Pipeline shows errors or broken status. Logs indicate write timeouts or connection failures to the destination.Possible causes and remediation:
- Destination overloaded: The target system cannot handle the write volume.
- Reduce the number of tasks temporarily to lower concurrent write pressure.
- For data warehouses, increase warehouse/cluster capacity.
- Network issues: Transient connectivity problems between Streamkap and the destination.
- Check firewall rules and network policies.
- Verify the destination is accessible from Streamkap IP addresses.
- Large records: Individual records that are very large (e.g., containing large text or blob fields) may exceed timeout thresholds.
- Consider excluding large columns from replication if they are not needed at the destination.
Uneven partition distribution
Uneven partition distribution
Symptoms: Some partitions have high lag while others are at zero. Overall throughput is limited by the slowest partition.Possible causes and remediation:
- Partition count lower than task count: Some tasks are idle while others are overloaded.
- Increase the partition count to match or exceed the task count. Follow the safe partition increase procedure.
- Hot partitions: Certain keys produce disproportionate traffic, creating uneven load across partitions.
- Monitor partition-level lag on the Consumer Groups page to identify imbalanced partitions.
- Recent partition increase: After increasing partitions, new data goes to new partitions but existing lag remains on old partitions until fully processed.
Increasing partitions only affects new data. Existing lag on old partitions will persist until those messages are fully consumed.
Tuning Workflow
Follow this workflow when optimizing pipeline performance:Identify the bottleneck
Check Consumer Groups for lag, Topics for throughput metrics, and Pipelines for latency. Determine whether the bottleneck is on the source side, Kafka layer, or destination side.
Increase maximum poll records
In the destination connector’s Advanced settings, increase Maximum poll records to
25000, 50000, or 80000 depending on your record sizes. This is the lowest-risk change and often provides immediate improvement.Increase topic partitions
On the Topics page, increase the partition count to at least
5 for high-throughput topics. Follow the safe partition increase procedure to avoid data consistency issues.Increase tasks
In the destination connector settings, increase Tasks to match the partition count. More tasks enable more parallel consumers and writers.
Monitor and iterate
After each change, monitor consumer lag, latency, and destination health for a meaningful period before making further adjustments.
Contact Streamkap support
If you have exhausted the tuning options above and performance is still not meeting your requirements, contact Streamkap support for advanced optimization assistance. Include your pipeline ID, current settings, and the metrics you are observing.
Sizing by Use Case
Different workloads have different performance characteristics. Use the recommendations below as starting points and adjust based on monitoring.- High Throughput
- Low Latency
- Large Records
- Small / Frequent Records
Goal: Maximize the number of records processed per second.
- Increase Maximum poll records to
50000or80000 - Increase Topic partition count to
5or higher - Increase Tasks to match the partition count
- Set
fetch.min.bytesto100000(100 KB) or higher to allow the broker to batch more data per fetch - Set
fetch.max.wait.msto1000ms or higher to give the broker time to accumulate larger batches - Use a destination tier with sufficient write capacity (larger warehouse, higher IOPS database, etc.)
Related Documentation
- Topics - Manage Kafka topics, partitions, and view throughput metrics
- Consumer Groups - Monitor consumer lag and manage offset positions
- Pipelines - Monitor pipeline health, latency, and lag
- Snapshots & Backfilling - Plan and optimize snapshot operations