Azure PostgreSQL
PostgreSQL Change Data Capture Setup on Azure with Streamkap
Prerequisites
- PostgreSQL version ≥ 10.x
- PostgreSQL logical replication enabled on the primary database or Standby version ≥ 16.1
- Connection details
- An admin database user
WITH REPLICATION
privileges
Granting Privileges
It's recommended to create a separate user for Streamkap to access your PostgreSQL database. Below is an example script that does that. You can modify it to suit your security policies.
-- Replace { ... } placeholders as required
CREATE USER streamkap_user PASSWORD '{password}';
-- Grant the user permissions to manage replication slots and publications
ALTER USER streamkap_user WITH REPLICATION;
-- Grant Streamkap permissions on the database, schema and all tables to capture
-- Create on database is used to create the publication. Not needed if you create yourself
GRANT CONNECT ON DATABASE "{database}" TO streamkap_role;
GRANT CREATE, USAGE ON SCHEMA "{schema}" TO streamkap_role;
GRANT SELECT ON ALL TABLES IN SCHEMA "{schema}" TO streamkap_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA "{schema}" GRANT SELECT ON TABLES TO streamkap_role;
Enable Snapshots
You will need to create the table in the source database and give permissions to the streamkap_user
. Streamkap will use this collection for managing snapshots.
Please create the signal table with the name
streamkap_signal
. It will not be recognised if given another name.
-- Create schema
CREATE SCHEMA streamkap;
-- Switch to the newly created schema
SET search_path TO streamkap;
-- Create the table
CREATE TABLE streamkap_signal (
id VARCHAR(255) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2000) NULL
);
-- Grant necessary privileges on the table to the role
GRANT SELECT, UPDATE, INSERT ON TABLE streamkap_signal TO streamkap_user;
Enabling Logical Replication
WAL and database storage
Ensure that your database server has ample free space for logical replication to maintain the logs or you may face an interruption to your database. Enable sufficient space with auto growth on the database, monitor the WAL and configure the publication slot to contain only tables you need
- Go to your Azure database's server parameters page on the Azure portal
- Set the parameter
wal_level
toLOGICAL
- Search for
shared_preload_libraries
, andazure.extensions
parameters, and selectPGLOGICAL
from the dropdown list box - Update
max_worker_processes
parameter value to at least16
. Otherwise, you might encounter issues likeWARNING: out of background worker slots
- Save the changes and restart the server to apply them
Create Publication(s) & Slot
Publications contain a set of change events for the tables you include.
Log in to a PostgreSQL console (using tools such as a SQL workbench or psql) as an admin database user.
Create a publication for your tables. You can create a publication for all tables or be selective.
Snapshot required
You cannot alter
FOR ALL TABLES
publications to include/exclude tables.If you set up a
FOR ALL TABLES
publication and later decide to change that, you have to drop the publication and create another to include specific tables e.g.CREATE PUBLICATION ... TABLE table1, table2, table3, ...
.However, any change events that occur before the new publication's created will not be included in it, so a snapshot's required to ensure they are not missed by your Streamkap pipelines.
You should also stop the Source before changing the publication.
-- Note: You must have table ownership privileges on the table(s)
-- Create a publication for all tables or specific table(s) to replicate
-- Option 1: All Tables
CREATE PUBLICATION streamkap_pub FOR ALL TABLES;
-- Option 2: Specific Tables
CREATE PUBLICATION streamkap_pub FOR TABLE table1, table2, table3, ...;
-- If you didn't add ALL TABLES, add our signal table to the publication
ALTER PUBLICATION streamkap_pub ADD TABLE streamkap_signal;
-- If you need to drop a table
-- ALTER PUBLICATION streamkap_pub DROP TABLE table5;
-- PostgreSQL 13 or later, enable the adding of partitioned tables
-- CREATE PUBLICATION streamkap_pub FOR ALL TABLES WITH (publish_via_partition_root=true);
-- Create a logical replication slot
SELECT pg_create_logical_replication_slot('streamkap_pgoutput_slot', 'pgoutput');
-- Verify the table(s) to replicate were added to the publication
SELECT * FROM pg_publication_tables;
Log in as the Streamkap user and verify it can read the replication slot by running the following command:
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('streamkap_pgoutput_slot', null, null, 'proto_version', '1', 'publication_names', 'streamkap_pub');
Consider Access Restrictions
- Visit Connection Options to ensure Streamkap can reach your database
Setup PostgreSQL Connector in Streamkap
- Go to Sources and click Create New
- Input
- Name for your Connector
- Hostname
- Port (Default
5432
) - Username (Username you chose earlier, our scripts use
streamkap_user
) - Password
- Heartbeat - Required for low volume connectors. See PostgreSQL Heartbeats
- Signal Table Schema - Streamkap will use a collection in this schema to manage snapshots e.g.
streamkap
. See Enable Snapshots for more information - Database Name
- Replication Slot Name (Default
streamkap_pgoutput_slot
) - Publication Name (Default
streamkap_pub
) - Connect via SSH Tunnel. See SSH Tunnel
- Advanced Parameters
- Represent Binary Data As (Default
bytes
)
- Represent Binary Data As (Default
- Add Schemas/Tables. Can also bulk upload here. The format is a simple list of each schema or table per row saved in csv format without a header.
- Click Save
The connector will take approximately 1 minute to start processing data.
Updated about 2 months ago