Neon PostgreSQL
PostgreSQL Change Data Capture Setup on Neon with Streamkap
Prerequisites
- PostgreSQL version ≥ 10
- A database user with sufficient privileges to configure the database, including enabling logical replication and creating users
PostgreSQL Setup
1. Grant Database Access
- Configure one of the Connection Options to ensure Streamkap can reach your database.
2. Create Database User
It's recommended to create a separate user and role for Streamkap to access your PostgreSQL database. Below is an example script that does that.
-- Replace { ... } placeholders as required
CREATE USER streamkap_user PASSWORD '{password}';
-- Create a role for Streamkap
CREATE ROLE streamkap_role nologin PASSWORD '{password}';
GRANT streamkap_role TO streamkap_user;
GRANT rds_replication TO streamkap_role;
-- Grant Streamkap permissions on the database, schema and all tables to capture
GRANT CONNECT ON DATABASE "{database}" TO streamkap_role;
GRANT CREATE, USAGE ON SCHEMA "{schema}" TO streamkap_role;
GRANT SELECT ON ALL TABLES IN SCHEMA "{schema}" TO streamkap_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA "{schema}" GRANT SELECT ON TABLES TO streamkap_role;
-- Grant replication role to the user
ALTER USER streamkap_user WITH REPLICATION;
3. Enable Snapshots
To backfill your data, the Connector needs to be able to perform snapshots. See Snapshots & Backfilling for more information.
To enable this feature, there are 2 methods available:
Method 1: Enable read only connection
This method is recommended if you cannot create a table in the source database and grant the Connector read/write privileges to that.
- Set Read only to Yes during Streamkap Setup. No other configuration should be necessary.
Method 2: Create a table in the source database
Not supported on read replicas. Please use method 2 instead.
You will need to create the table and give necessary permissions to the streamkap_user
. The Connector will use this collection for managing snapshots. Below is an example script that does that.
Please create the signal table with the name
streamkap_signal
. It will not be recognised if given another name.
-- Create the schema
CREATE SCHEMA streamkap;
-- Switch to the newly created schema
SET search_path TO streamkap;
-- Create the table
CREATE TABLE streamkap_signal (
id VARCHAR(255) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2000) NULL
);
-- Grant necessary privileges on the table to the role
GRANT CREATE, USAGE ON SCHEMA streamkap TO streamkap_role;
GRANT SELECT ON ALL TABLES IN SCHEMA streamkap TO streamkap_role;
GRANT SELECT, UPDATE, INSERT ON TABLE streamkap_signal TO streamkap_role;
Publications and signal tables
When you create the PostgreSQL publication in the next step, if you choose to specify tables for capture instead of all tables, you must include the
streamkap_signal
table.
4. Create Publication & Slot
REPLICA IDENTITY
and deleted records (PostgreSQL 13 and newer)Introduced in PostgreSQL 13, the
REPLICA IDENTITY
table setting controls what data is logged for row updates and deletes.By default, only the primary key and Streamkap metadata column values are retained for deleted records. All other columns will be empty. This leaves you with an incomplete record.
If you require - for auditing and historical tracking purposes - all column values for deleted records, or if your deletion strategy for your destination is 'soft deletes' (retain the deleted record with a deletion flag), you must set the
REPLICA IDENTITY
toFULL
for all capture tables.ALTER TABLE {table} REPLICA IDENTITY FULL;
This ensures complete data retention.
Publications contain a set of change events for the tables you want the Connector to capture.
- Create a publication for your tables. You can create a publication for all tables or selected tables.
-- Create a publication for all tables to capture
CREATE PUBLICATION streamkap_pub FOR ALL TABLES;
-- Create a publication for specific tables to capture
CREATE PUBLICATION streamkap_pub FOR TABLE table1, table2, table3, ...;
-- Verify the tables to capture were added to the publication
SELECT * FROM pg_publication_tables where pubname = 'streamkap_pub';
Altering publications
You cannot alter
FOR ALL TABLES
publications to include/exclude tables.If you set up a
FOR ALL TABLES
publication and later decide to change that, you have to drop the publication and create another to include specific tables e.g.CREATE PUBLICATION ... TABLE table1, table2, table3, ...
.However, any change events that occur before the new publication's created will not be included in it, so a snapshot's required to ensure they are not missed by your Streamkap pipelines.
You should also stop the Source before changing the publication.
A replication slot represents a stream of change events the Connector reads from.
- Create a replication slot.
-- Create a logical replication slot
SELECT pg_create_logical_replication_slot('streamkap_pgoutput_slot', 'pgoutput');
-- Verify the replication slot is working (this may take a few moments to return the count)
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('streamkap_pgoutput_slot', null, null, 'proto_version', '1', 'publication_names', 'streamkap_pub');
Streamkap Setup
Follow these steps to configure your new connector:
1. Create the Source
- Navigate to Add Connectors.
- Choose PostgreSQL.
2. Connection Settings
- Name: Enter a name for your connector.
- Hostname: Specify the hostname.
PgBouncer and pooled connections
Neon uses PgBouncer to support connection pooling via pooler hostnames like this
ep-cool-darkness-123456-pooler.us-east-2.aws.neon.tech
(notice the-pooler
option). However, PgBouncer has very limited support for PostgreSQL startup options, and Streamkap depends on one PostgreSQL optionreplication
it does not support. The Connector will fail with a PgBouncer unsupported startup parameter error in this scenario.Because of that, Neon's connection pooling cannot be used, so please remove (if present) the
-pooler option
from the hostname. This means connections from Streamkap will be unpooled, and Neon has limits on unpooled connections you should be aware of.
-
Port: Default is
5432
. -
Connect via SSH Tunnel: The Connector will connect to an SSH server in your network which has access to your database. This is necessary if the Connector cannot connect directly to your database.
- See SSH Tunnel for setup instructions.
-
Username: Username to access the database. By default, Streamkap scripts use
streamkap_user
. -
Password: Password to access the database.
-
Database: Specify the database to stream data from.
-
Read only: Whether or not to use a read-only connection.
- When connecting to a read replica, set this to Yes to support Streamkap snapshots.
-
Heartbeats: Crucial for low and intermittent traffic databases.
- Heartbeat Table Schema: The Connector will use a table in this schema to send heartbeats.
- See PostgreSQL Heartbeats for setup instructions.
- Heartbeat Table Schema: The Connector will use a table in this schema to send heartbeats.
3. Snapshot Settings
If you set Read only to No, you will need to create a snapshot signal table and give permissions to the
streamkap_user
. See Enable Snapshots for setup instructions.
- Signal Table Schema: The Connector will use a table in this schema to manage snapshots.
4. Replication Settings
- Replication Slot Name: The name of the replication slot for the connector to use. Default is
streamkap_pgoutput_slot
. - Publication Name: The name of the publication for the connector to use. Default is
streamkap_pub
.
5. Advanced Parameters
- SSL mode: Whether to use an encrypted connection to the PostgreSQL server. By default, it's required.
- Prefix with Database Name?: Changes the format of topics to
DatabaseName_TopicName
- Represent binary data as: Specifies how the data for binary columns e.g.
blob
,binary
,varbinary
should be interpreted. Your destination for this data can impact which option you choose. Default isbytes
.
Click Next.
6. Schema and Table Capture
- Add Schemas/Tables: Specify the schema(s) and table(s) for capture
- You can bulk upload here. The format is a simple list of schemas and tables, with each entry on a new row. Save as a
.csv
file without a header.
- You can bulk upload here. The format is a simple list of schemas and tables, with each entry on a new row. Save as a
Click Save.
Updated 11 days ago