PostgreSQL Source FAQ
PostgreSQL Sources FAQ for Streamkap
This FAQ focuses on using PostgreSQL as a source in Streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Azure, Google Cloud SQL, Neon). Streamkap's PostgreSQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.
What is a PostgreSQL source in Streamkap?
A PostgreSQL source in Streamkap enables real-time Change Data Capture (CDC) from PostgreSQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses logical replication to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic DBs. Streamkap abstracts complexity, offering a serverless setup via UI or API.
What PostgreSQL versions are supported as sources?
- PostgreSQL 10+ for basic CDC; 13+ for advanced features like read-only snapshots and partitioned tables with
publish_via_partition_root
. Compatible with PostgreSQL 15–17+ in cloud deployments. - PostgreSQL 9.4+ for logical decoding; 10+ for
pgoutput
plugin; 13+ for read-only incremental snapshots; 16+ for replica server slots; 17+ for failover-enabled slots.
What PostgreSQL deployments are supported?
Streamkap supports:
- Self-hosted (on-prem/VM).
- AWS RDS PostgreSQL (including Aurora and Serverless).
- Azure Database for PostgreSQL.
- Google Cloud SQL for PostgreSQL.
- Neon PostgreSQL.
Streamkap also supports these, plus CrunchyBridge and Amazon RDS, with similar configs.
What are the key features of PostgreSQL sources in Streamkap?
- CDC: Log-based via
pgoutput
decoder; captures inserts/updates/deletes. - Snapshots: Ad-hoc/initial backfills using read-only (PG 13+) or signal table methods; watermarking for minimal impact.
- Schema Evolution: Automatic handling of adds/drops/type changes.
- Heartbeats: Enabled by default to prevent WAL buildup in low-traffic DBs via a
streamkap_heartbeat
table. - Data Types: Standard mappings (numerics, temporals, strings, binary as bytes/base64/hex, arrays/JSON/hstore); supports pgvector/VECTOR types in latest Streamkap.
- Ingestion Modes: Inserts (append) or upserts.
- Security: SSL, SSH/VPN, IP allowlisting.
- Monitoring: Latency, lag, WAL metrics in-app.
- Streamkap adds transaction metadata, ad-hoc snapshots with filters, and custom types like PostGIS/GEOMETRY.
How does CDC work for PostgreSQL sources?
Streamkap reads PostgreSQL's write-ahead log (WAL) via logical decoding, emitting changes as events. It uses Streamkap's pgoutput
plugin for native support. REPLICA IDENTITY FULL ensures complete before/after images for updates/deletes.
What is REPLICA IDENTITY and why does it matter?
REPLICA IDENTITY controls logged data for changes:
- DEFAULT: Logs PK only (partial deletes).
- FULL: Logs all columns (complete records, recommended for audits/soft deletes).
Set viaALTER TABLE ... REPLICA IDENTITY FULL;
. Required for full delete handling
How do snapshots work for PostgreSQL sources?
- Trigger ad-hoc at source/table level. Methods: Read-only (PG 13+, no signal table) or
streamkap_signal
table. Uses watermarking for incremental loads. - Modes like
initial
,always
,when_needed
; supports incremental (chunked, parallel) and read-only (PG 13+); ad-hoc via signaling. Streamkap simplifies triggering via UI.
What are heartbeats and how do they work?
Heartbeats prevent WAL buildup in low-traffic DBs by periodically updating a streamkap_heartbeat
table, advancing the replication slot.
- Setup: Create schema/table, grant permissions, insert initial row, add to publication if needed.
- How it Works: Connector updates table periodically, generating change events.
- Verification: Check table timestamps; troubleshoot permissions/publication inclusion.
- Limitations: Requires manual addition to specific-table publications. Streamkap has configurable
heartbeat.interval.ms
for similar events.
What data types are supported?
- Basics: Integers, floats, strings, dates/timestamps (micro/nano precision modes).
- Advanced: Arrays, JSON, binary (bytes/hex), decimals (precise/double/string modes).
- Custom: Domain types, network addresses, PostGIS (GEOMETRY/GEOGRAPHY), pgvector (VECTOR/HALFVEC/SPARSEVEC).
- Unsupported: Non-UTF8 encodings; some spatial/custom without config.
How to monitor WAL for PostgreSQL sources?
- Use queries like
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) FROM pg_replication_slots WHERE slot_name = 'streamkap_slot';
for size/lag. - Tools: Datadog (wal_bytes), New Relic/Grafana (custom queries).
- Best Practices: Retain 3–5 days WAL; alert on growth; drop inactive slots.
- Streamkap: Monitor via JMX; WAL issues from inactive slots common.
What are common limitations?
- Non-UTF8 unsupported; data loss risk on upgrades (recreate slots); read replicas limit snapshots; no DDL events.
- No DDL; PK changes need manual handling; TOASTed values may be incomplete; consistency issues pre-commit; generated columns missing in
pgoutput
. - WAL buildup without heartbeats; schema changes during snapshots unsupported.
How to handle deletes?
Use REPLICA IDENTITY FULL for complete records. Supports soft deletes. In upserts, deletes propagate as events.
Troubleshooting common issues
- WAL Buildup: Enable heartbeats; monitor/drop slots; retain 3–5 days.
- Upgrade Data Loss: Stop writes, capture events, recreate slot, resnapshot.
- Connection Failures: Verify pg_hba.conf, firewalls, SSL.
- Missing Events: Ensure publication includes tables; check REPLICA IDENTITY.
Best practices for PostgreSQL sources
- Use dedicated replication user.
- Limit publications to needed tables.
- Enable auto-vacuum; set WAL retention.
- Test snapshots in staging.
- For cloud: Monitor provider tools; use heartbeats for low traffic.
- Use PG 17+ for failover resilience.
Updated about 8 hours ago