MySQL Source FAQ

MySQL Source FAQ

This FAQ focuses on using MySQL as a source in streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Google Cloud SQL, Azure MySQL Database). Streamkap's MySQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.

What is a MySQL source in streamkap?

A MySQL source in streamkap enables real-time Change Data Capture (CDC) from MySQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses the binary log (binlog) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for schema changes. Streamkap offers a serverless setup via UI or API.
What MySQL versions are supported as sources?

Streamkap: MySQL 5.7+ for basic CDC; 8.0+ for advanced features like GTIDs and query log events; 8.4+ for latest enhancements. Compatible with MySQL 5.6.5+ in limited modes.

What MySQL deployments are supported?

Streamkap supports:

  • Self-hosted/Generic (on-prem/VM).
  • AWS RDS MySQL (including Aurora).
  • Google Cloud SQL for MySQL.
  • Azure MySQL Database.
  • Any other MySQL compatible platform
  • Streamkap also supports standalone, primary/replica, high availability clusters, and multi-primary topologies.

What are the key features of MySQL sources in streamkap?

  • CDC: Binlog-based for inserts/updates/deletes; captures schema changes.
  • Snapshots: Ad-hoc/initial using global/table locks; incremental or blocking.
  • Schema Evolution: Automatic DDL tracking; emits change events.
  • Data Types: Standard mappings (numerics, temporals, strings, binary configurable, JSON, ENUM/SET); temporal/decimal/binary handling modes.
  • Ingestion Modes: Inserts (append) or upserts.
  • Security: SSL, authentication.
  • Monitoring: Latency, lag, binlog metrics.
  • Heartbeats: For low-traffic databases to advance binlog position via a heartbeat table.
  • Streamkap adds transaction metadata, row filtering, and original SQL queries if enabled.

How does CDC work for MySQL sources?

Streamkap reads MySQL's binary log (binlog) to capture committed operations, emitting changes as events. Requires binlog_format=ROW and binlog_row_image=FULL. Uses GTID mode for position tracking in multi-primary setups, recommended for reliability.

What is binlog_row_image and why does it matter?

binlog_row_image controls logged data for changes:

  • MINIMAL: Logs PK only (partial deletes).
  • FULL: Logs all columns (complete records, recommended).
  • Set to FULL for full delete handling.

How do snapshots work for MySQL sources?

Trigger ad-hoc at source/table level.

  • Methods: Global read lock or table-level; incremental (chunked by PK, default 1024 rows) or blocking.
  • Modes: initial (default), always, initial_only, no_data, when_needed, configuration_based, custom.
  • Streamkap simplifies triggering via UI.

What data types are supported?

  • Basics: Booleans (BOOLEAN), integers (INT8/16/32/64), floats (FLOAT32/64), strings (STRING), dates/timestamps (adaptive/connect modes).
  • Advanced: Binary (BYTES/base64/hex), decimals (precise/double/string modes), JSON (STRING/io.debezium.data.Json), ENUM/SET (STRING/io.debezium.data.Enum/Set).
  • Character: CHAR/VARCHAR/TEXT (STRING).
  • Unsupported: Non-UTF8; spatial (GEOMETRY); MyISAM tables; binlog expiration losses.

How do heartbeats work for MySQL sources?

Heartbeats prevent binlog buildup in low-traffic DBs by periodically updating a heartbeat table, advancing the binlog position.

  • Setup: Create the heartbeat table in the source database; the connector updates it.
  • How it Works: Generates change events to keep binlog moving.
  • Verification: Check table updates; troubleshoot permissions.
  • Limitations: Manual setup required for low-traffic scenarios.

How to monitor binlog for MySQL sources?

Use queries like SHOW BINARY LOGS; for size/position; tools like Datadog/New Relic for lag. Retain 3–5 days; alert on expiration/growth.

What are common limitations?

  • MyISAM unsupported; binlog expiration can lose events; PK changes require manual handling; TOASTed equivalents incomplete without FULL image.
  • High binlog growth without monitoring; schema changes during snapshots unsupported; read replicas for snapshots in some cloud setups.

How to handle deletes?

  • Captures deletes as events with before images if binlog_row_image=FULL; supports soft deletes.

What security features are available?

Encrypted connections (SSL), IP allowlisting, role-based access (dedicated user recommended).

Troubleshooting common issues

  • Binlog Buildup: Enable heartbeats; monitor retention; drop unused logs.
  • Data Loss on Upgrade: Stop writes, capture all events, recreate positions post-upgrade, resnapshot.
  • Missing Events: Ensure binlog enabled and tables included; check REPLICATION privileges.

Best practices for MySQL sources

  • Use dedicated replication user (not root).
  • Enable GTID mode for better failover.
  • Limit databases/tables to reduce binlog size.
  • Enable auto-vacuum equivalents; set binlog retention to 3-5 days.
  • For deletes, use FULL binlog_row_image.
  • Test snapshots in staging; monitor lag via tools.
  • For cloud: Use provider monitoring; enable heartbeats for low traffic.