Azure PostgreSQL

PostgreSQL Change Data Capture Setup on Azure with Streamkap

Prerequisites

  • PostgreSQL version ≥ 10.x
  • PostgreSQL logical replication enabled on the primary database
  • Connection details
  • Streamkap user and role

Granting Privileges

It's recommended to create a separate user and role for Streamkap to access your PostgreSQL database. Below is an example script that does that. You can modify it to suit your security policies.

-- Replace { ... } placeholders as required
CREATE USER streamkap_user PASSWORD '{password}';

-- Create a role for Streamkap
CREATE ROLE streamkap_role REPLICATION LOGIN;
GRANT streamkap_role TO streamkap_user;

-- Grant Streamkap permissions on the database, schema and all tables to capture
GRANT CREATE ON DATABASE "{database}" TO streamkap_role;
GRANT CREATE, USAGE ON SCHEMA "{schema}" TO streamkap_role;
GRANT SELECT ON ALL TABLES IN SCHEMA "{schema}" TO streamkap_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA "{schema}" GRANT SELECT ON TABLES TO streamkap_role;

Enable Snapshots

You will need to create the table in the source database and give permissions to the streamkap_role. Streamkap will use this collection for managing snapshots.

This table can exist in a different schema (on the same database) to the schema Streamkap captures data from.

❗️

Please create the signal table with the name streamkap_signal. It will not be recognised if given another name.

CREATE TABLE streamkap_signal (
  id VARCHAR(255) PRIMARY KEY, 
  type VARCHAR(32) NOT NULL, 
  data VARCHAR(2000) NULL
);

GRANT SELECT, UPDATE, INSERT ON streamkap_signal TO streamkap_role;

Enabling Logical Replication

🚧

WAL logs and database storage

Ensure that your database server has ample free space for logical replication to maintain the logs or you may face an interruption to your database. We delete the logs as we process but you should ensure you have ample space

Set the Azure replication support to logical. You can use the Azure CLI or the Azure Portal to configure this. Here's an example script to do that.

# Replace { ... } placeholders as required

az postgres server configuration set --resource-group {group} --server-name {server} --name azure.replication_support --value logical
az postgres server restart --resource-group {group} --name {server}

Consider Access Restrictions

Setup PostgreSQL Connector in Streamkap

  • Go to Sources and click Create New
  • Input
    • Name for your Connector
    • Hostname
    • Port (Default 5432)
    • Username (Username you chose earlier, our scripts use streamkap_user)
    • Password
    • Signal Table Schema - Streamkap will use a collection in this schema to manage snapshots e.g. public. See Enable Snapshots for more information
    • Database Name
    • Replication Slot Name (Default streamkap_pgoutput_slot)
    • Publication Name (Default streamkap_pub)
    • Advanced Parameters
      • Snapshot Mode (Default When Needed) See PostgreSQL Snapshot Modes for more information
      • Represent Binary Data As (Default bytes)
      • Snapshot Chunk Size (Default 1024) - This is the number of rows read at a time when snapshotting. This is a low safe value. As a guide, if you have 100m + rows of data you may want to move this to 5120. If you have 1bn then a higher number still will allow you to backfill faster.
      • Max Batch Size (Default 2048) - A value that specifies the maximum size of each batch of events that the connector processes. Only increase if experiencing lag
    • Add Schemas/Tables. Can also bulk upload here. The format is a simple list of each schema or table per row saved in csv format without a header.
    • Click Save
      The connector will take approximately 1 minute to start processing data.