What is a MySQL source in Streamkap?
What is a MySQL source in Streamkap?
A MySQL source in streamkap enables real-time Change Data Capture (CDC) from MySQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses the binary log (binlog) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for schema changes. Streamkap offers a serverless setup via UI or API.What MySQL versions are supported as sources?Streamkap: MySQL 5.7+ for basic CDC; 8.0+ for advanced features like GTIDs and query log events; 8.4+ for latest enhancements. Compatible with MySQL 5.6.5+ in limited modes.
What MySQL deployments are supported?
What MySQL deployments are supported?
Streamkap supports:
- Self-hosted/Generic (on-prem/VM)
- AWS RDS MySQL (including Aurora)
- Google Cloud SQL for MySQL
- Azure MySQL Database
- Any other MySQL compatible platform
What are the key features of MySQL sources in Streamkap?
What are the key features of MySQL sources in Streamkap?
- CDC: Binlog-based for inserts/updates/deletes; captures schema changes
- Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking
- Schema Evolution: Automatic DDL tracking; emits change events
- Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes
- Ingestion Modes: Inserts (append) or upserts
- Security: SSL, authentication
- Monitoring: Latency, lag, binlog metrics
- Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table
- Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled
How does CDC work for MySQL sources?
How does CDC work for MySQL sources?
Streamkap reads MySQL’s binary log (binlog) to capture committed operations, emitting changes as events.Requires
Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
binlog_format=ROW and binlog_row_image=FULL.Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.
What is binlog_row_image and why does it matter?
What is binlog_row_image and why does it matter?
binlog_row_image controls logged data for changes:- MINIMAL: Logs PK only (partial deletes)
- FULL: Logs all columns (complete records, recommended)
How do snapshots work for MySQL sources?
How do snapshots work for MySQL sources?
Trigger ad-hoc at source/table level.
- Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking
- Modes:
initial(default),always,initial_only,no_data,when_needed,configuration_based,custom
What data types are supported?
What data types are supported?
- Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes)
- Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (
STRING/io.debezium.data.Json), ENUM/SET (STRING/io.debezium.data.Enum/Set) - Character: CHAR/VARCHAR/TEXT (STRING)
- Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses
How do heartbeats work for MySQL sources?
How do heartbeats work for MySQL sources?
Heartbeats prevent binlog buildup in low-traffic DBs by periodically updating a heartbeat table, advancing the binlog position.
- Setup: Create the heartbeat table in the source database; the connector updates it
- How it Works: Generates change events to keep binlog moving
- Verification: Check table updates; troubleshoot permissions
- Limitations: Manual setup required for low-traffic scenarios
How to monitor binlog for MySQL sources?
How to monitor binlog for MySQL sources?
Use queries like
SHOW BINARY LOGS for size/position; tools like Datadog/New Relic for lag.Best Practices: Retain 3–5 days; alert on expiration/growth.What are common limitations?
What are common limitations?
- MyISAM unsupported
- Binlog expiration can lose events
- PK changes require manual handling
- TOASTed equivalents incomplete without FULL image
- High binlog growth without monitoring
- Schema changes during snapshots unsupported
- Read replicas for snapshots in some cloud setups
How to handle deletes?
How to handle deletes?
- Captures deletes as events with before images if
binlog_row_image=FULL - Supports soft deletes
What security features are available?
What security features are available?
Encrypted connections (SSL), IP allowlisting, role-based access (dedicated user recommended).
Troubleshooting common issues
Troubleshooting common issues
- Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs
- Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot
- Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges
Can CDC capture database Views, Materialized Views, and other virtual objects?
Can CDC capture database Views, Materialized Views, and other virtual objects?
No, CDC cannot capture Views or most virtual database objects.Why Views cannot be captured:
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.MySQL-specific notes:
If you have a view
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:
- Views: Virtual tables with no physical storage or binlog entries
- Temporary Tables: Session-scoped, not logged persistently in binlog
- MEMORY Tables: In-memory storage engine, data lost on restart, not in binlog
- BLACKHOLE Tables: Discard all writes, no data to capture
- System Tables (information_schema, performance_schema, mysql): Metadata representations, not user data
- CTEs (Common Table Expressions): Query-time constructs with no persistent storage
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.MySQL-specific notes:
- MyISAM tables: Not supported for CDC as they don’t participate in transactions or generate complete binlog entries. Convert to InnoDB.
- Views created with
ALGORITHM=TEMPTABLE: Still not capturable; capture the base tables instead
If you have a view
customer_orders_summary that queries tables customers and orders, capture the customers and orders tables instead, then recreate the view logic downstream.Best practices for MySQL sources
Best practices for MySQL sources
- Use dedicated replication user (not root)
- Enable GTID mode for better failover
- Limit databases/tables to reduce binlog size
- Enable auto-vacuum equivalents; set binlog retention to 3-5 days
- For deletes, use FULL
binlog_row_image - Test snapshots in staging; monitor lag via tools
- For cloud: Use provider monitoring; enable heartbeats for low traffic