What is a MySQL source in Streamkap?
What is a MySQL source in Streamkap?
A MySQL source in streamkap enables real-time Change Data Capture (CDC) from MySQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses the binary log (binlog) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for schema changes. Streamkap offers a serverless setup via UI or API.What MySQL versions are supported as sources?Streamkap: MySQL 5.7+ for basic CDC; 8.0+ for advanced features like GTIDs and query log events; 8.4+ for latest enhancements. Compatible with MySQL 5.6.5+ in limited modes.
What MySQL deployments are supported?
What MySQL deployments are supported?
Streamkap supports:
- Self-hosted/Generic (on-prem/VM)
- AWS RDS MySQL (including Aurora)
- Google Cloud SQL for MySQL
- Azure MySQL Database
- Any other MySQL compatible platform
What are the key features of MySQL sources in Streamkap?
What are the key features of MySQL sources in Streamkap?
- CDC: Binlog-based for inserts/updates/deletes; captures schema changes
- Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking
- Schema Evolution: Automatic DDL tracking; emits change events
- Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes
- Ingestion Modes: Inserts (append) or upserts
- Security: SSL, authentication
- Monitoring: Latency, lag, binlog metrics
- Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table
- Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled
How does CDC work for MySQL sources?
How does CDC work for MySQL sources?
Streamkap reads MySQL’s binary log (binlog) to capture committed operations, emitting changes as events.Requires
Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
binlog_format=ROW
and binlog_row_image=FULL
.Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
What is binlog_row_image and why does it matter?
What is binlog_row_image and why does it matter?
binlog_row_image
controls logged data for changes:- MINIMAL: Logs PK only (partial deletes)
- FULL: Logs all columns (complete records, recommended)
How do snapshots work for MySQL sources?
How do snapshots work for MySQL sources?
Trigger ad-hoc at source/table level.
- Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking
- Modes:
initial
(default),always
,initial_only
,no_data
,when_needed
,configuration_based
,custom
What data types are supported?
What data types are supported?
- Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes)
- Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (
STRING/io.debezium.data.Json
), ENUM/SET (STRING/io.debezium.data.Enum/Set
) - Character: CHAR/VARCHAR/TEXT (STRING)
- Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses
How do heartbeats work for MySQL sources?
How do heartbeats work for MySQL sources?
Heartbeats prevent binlog buildup in low-traffic DBs by periodically updating a heartbeat table, advancing the binlog position.
- Setup: Create the heartbeat table in the source database; the connector updates it
- How it Works: Generates change events to keep binlog moving
- Verification: Check table updates; troubleshoot permissions
- Limitations: Manual setup required for low-traffic scenarios
How to monitor binlog for MySQL sources?
How to monitor binlog for MySQL sources?
Use queries like
SHOW BINARY LOGS
for size/position; tools like Datadog/New Relic for lag.Best Practices: Retain 3–5 days; alert on expiration/growth.What are common limitations?
What are common limitations?
- MyISAM unsupported
- Binlog expiration can lose events
- PK changes require manual handling
- TOASTed equivalents incomplete without FULL image
- High binlog growth without monitoring
- Schema changes during snapshots unsupported
- Read replicas for snapshots in some cloud setups
How to handle deletes?
How to handle deletes?
- Captures deletes as events with before images if
binlog_row_image=FULL
- Supports soft deletes
What security features are available?
What security features are available?
Encrypted connections (SSL), IP allowlisting, role-based access (dedicated user recommended).
Troubleshooting common issues
Troubleshooting common issues
- Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs
- Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot
- Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges
Best practices for MySQL sources
Best practices for MySQL sources
- Use dedicated replication user (not root)
- Enable GTID mode for better failover
- Limit databases/tables to reduce binlog size
- Enable auto-vacuum equivalents; set binlog retention to 3-5 days
- For deletes, use FULL
binlog_row_image
- Test snapshots in staging; monitor lag via tools
- For cloud: Use provider monitoring; enable heartbeats for low traffic