Amazon RDS Aurora MySQL

MySQL Change Data Capture Setup on Amazon RDS Aurora with Streamkap

Prerequisites

  • MySQL version ≥ 5.7
  • MySQL binlog enabled
  • Streamkap user and role

Granting Privileges

It's recommended to create a separate user and role for Streamkap to access your MySQL database. Below is an example script that does that.

-- Replace { ... } placeholders as required

-- Identify version
SHOW VARIABLES LIKE 'VERSION';

--On MySQL version 5.x to 8.0 
CREATE USER 'streamkap_user'@'%' IDENTIFIED BY '{password}';

--On MySQL version 8.0+ 
CREATE USER 'streamkap_user'@'%' IDENTIFIED WITH mysql_native_password BY '{password}';

--Grant Permissions
GRANT REPLICATION CLIENT, RELOAD, SHOW DATABASES, REPLICATION SLAVE ON *.* TO 'streamkap_user'@'%';

--Grant Select on all schemas needed
GRANT SELECT ON {schema}.* TO 'streamkap_user';

Enable Snapshots

You can perform ad-hoc snapshots of all or some of your tables in the Streamkap app. See Snapshots & Backfilling for more information.

To enable this feature, there are 2 methods available for MySQL databases.

Method 1: Enable GTID (Recommended)

👍

This method is recommended if you cannot create and grant Streamkap read/write privileges on a 'signal' table (method 2) for any reason. It's the equivalent of a 'read only' connection.

🚧

GTIDs only available in MySQL version ≥ 5.6.5

Global transaction identifiers (GTIDs) uniquely identify transactions that occur on a server within a cluster. Though not required, using GTIDs simplifies replication and enables you to more easily confirm if primary and replica servers are consistent as well as carry out incremental snapshots.

Set up following these instructions. Ensure you follow the guide for your version: MySQL Replication GTID - Configuring Aurora and that GTID mode is ON.

Method 2: Create a table in the source database

If you cannot enable GTID mode, you will need to create the table and give permissions to the streamkap_user. Streamkap will use this collection for managing snapshots.

❗️

Please create the signal table with the name streamkap_signal in a new schema called streamkap. It will not be recognised if given another name.

-- Create the schema
CREATE SCHEMA streamkap;

CREATE TABLE streamkap_signal (
  id VARCHAR(255) PRIMARY KEY, 
  type VARCHAR(32) NOT NULL, 
  data VARCHAR(2000) NULL
);

GRANT SELECT, UPDATE, INSERT ON streamkap.streamkap_signal TO 'streamkap_user';

Configure binary logging

  1. Open the Amazon RDS console at https://console.aws.amazon.com/rds/
  2. In the navigation pane, choose Parameter groups
  3. Choose the parameter group used by the DB instance you want to modify
  4. You can't modify a default parameter group. If the DB instance is using a default parameter group, create a new parameter group and associate it with the DB instance
  5. From Parameter group actions, choose Edit
  6. Set the binlog_format parameter to the binary logging format of ROW
  7. Set the binlog_row_imageparameter to Full
  8. Choose Save changes to save the updates to the DB parameter group

Set binary log retention period

  1. Connect to your master database with your SQL tool.
  2. View current settings with CALL mysql.rds_show_configuration;
  3. If less than 24 hours or null runCALL mysql.rds_set_configuration('binlog retention hours', 72);

Verify binary logs are enabled

You can either:

  • Check the parameter group for the DB instance and that log_bin parameter is ON
  • Run the following SQL query on the DB instance SHOW VARIABLES LIKE '%log_bin%';. Result should be ON
  • Run SHOW BINARY LOGS

Consider Access Restrictions

Setup MySQL Connector in Streamkap

  • Go to Sources and click Create New
  • Input
    • Name for your Connector
    • Hostname
    • 📘

      Aurora endpoints and binary logs

      Binary logs are accessible only from the primary DB instance, not from the replicas.

      Please use the cluster endpoint rather than reader or instance endpoints. In the event of failure, the Connector can then fail over to a new primary DB instance.

      See Amazon connection management for more about the types of endpoints.

    • Port (Default 3306)
    • Username (Username you chose earlier, our scripts use streamkap_user)
    • Password
    • Heartbeat - Required for low volume connectors. See MySQL Heartbeats
    • Connection Timezone - The timezone of your database
    • 📘

      Timezone conversion

      MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. By default, the current time zone for each connection is the database server's time zone but this option allows you to override that.

      As long as the time zones remain the same, you get back the same value you store.

      We recommend using the default SERVER option which attempts to detect the session time zone from the values configured on the MySQL server session variables 'time_zone' or 'system_time_zone'. It also reduces the chance of problems with daylight savings adjustment 'fall back' and 'spring forward'.

      If either time zones change, an ad-hoc snapshot is recommended so your source and destination timestamps are consistent.

    • Use GTID
      • If your database is using GTID, leave this as 'Yes'. See Enable GTID for more information.
      • If 'No', please ensure you create the signal table as described here
        • Signal Table Database: Streamkap will use a table in this database to manage snapshots e.g. public. See Enable Snapshots for more information
    • Connect via SSH Tunnel. See SSH Tunnel
    • Advanced Parameters
      • Represent Binary Data As (Default bytes)
    • Add Schemas/Tables. Can also bulk upload here. The format is a simple list of each schema or table per row saved in csv format without a header.
    • Click Save
      The connector will take approximately 1 minute to start processing data.