Skip to main content

PostgreSQL Sources FAQ for Streamkap

This FAQ focuses on using PostgreSQL as a source in Streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Azure, Google Cloud SQL, Neon). Streamkap’s PostgreSQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.
A PostgreSQL source in Streamkap enables real-time Change Data Capture (CDC) from PostgreSQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses logical replication to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic DBs. Streamkap abstracts complexity, offering a serverless setup via UI or API.
  • PostgreSQL 10+ for basic CDC; 13+ for advanced features like read-only snapshots and partitioned tables with publish_via_partition_root. Compatible with PostgreSQL 15–17+ in cloud deployments.
  • PostgreSQL 9.4+ for logical decoding; 10+ for pgoutput plugin; 13+ for read-only incremental snapshots; 16+ for replica server slots; 17+ for failover-enabled slots.
Streamkap supports:
  • Self-hosted (on-prem/VM)
  • AWS RDS PostgreSQL (including Aurora and Serverless)
  • Azure Database for PostgreSQL
  • Google Cloud SQL for PostgreSQL
  • Neon PostgreSQL
Streamkap also supports these, plus CrunchyBridge and Amazon RDS, with similar configs.
  • CDC: Log-based via pgoutput decoder; captures inserts/updates/deletes
  • Snapshots: Ad-hoc/initial backfills using read-only (PG 13+) or signal table methods; watermarking for minimal impact
  • Schema Evolution: Automatic handling of adds/drops/type changes
  • Heartbeats: Enabled by default to prevent WAL buildup in low-traffic DBs via a streamkap_heartbeat table
  • Data Types: Standard mappings (numerics, temporals, strings, binary as bytes/base64/hex, arrays/JSON/hstore); supports pgvector/VECTOR types in latest Streamkap
  • Ingestion Modes: Inserts (append) or upserts
  • Security: SSL, SSH/VPN, IP allowlisting
  • Monitoring: Latency, lag, WAL metrics in-app
  • Streamkap adds transaction metadata, ad-hoc snapshots with filters, and custom types like PostGIS/GEOMETRY
Streamkap reads PostgreSQL’s write-ahead log (WAL) via logical decoding, emitting changes as events. It uses Streamkap’s pgoutput plugin for native support. REPLICA IDENTITY FULL ensures complete before/after images for updates/deletes.
REPLICA IDENTITY controls logged data for changes:
  • DEFAULT: Logs PK only (partial deletes)
  • FULL: Logs all columns (complete records, recommended for audits/soft deletes).
Set via ALTER TABLE ... REPLICA IDENTITY FULL;. Required for full delete handling.
  • Trigger ad-hoc at source/table level.
    Methods: Read-only (PG 13+, no signal table) or streamkap_signal table. Uses watermarking for incremental loads.
  • Modes like initial, always, when_needed; supports incremental (chunked, parallel) and read-only (PG 13+); ad-hoc via signaling. Streamkap simplifies triggering via UI.
Heartbeats prevent WAL buildup in low-traffic DBs by periodically updating a streamkap_heartbeat table, advancing the replication slot.
  • Setup: Create schema/table, grant permissions, insert initial row, add to publication if needed
  • How it Works: Connector updates table periodically, generating change events
  • Verification: Check table timestamps; troubleshoot permissions/publication inclusion
  • Limitations: Requires manual addition to specific-table publications. Streamkap has configurable heartbeat.interval.ms for similar events
  • Basics: Integers, floats, strings, dates/timestamps (micro/nano precision modes)
  • Advanced: Arrays, JSON, binary (bytes/hex), decimals (precise/double/string modes)
  • Custom: Domain types, network addresses, PostGIS (GEOMETRY/GEOGRAPHY), pgvector (VECTOR/HALFVEC/SPARSEVEC)
  • Unsupported: Non-UTF8 encodings; some spatial/custom without config
  • Use queries like:
    SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn))
    FROM pg_replication_slots
    WHERE slot_name = 'streamkap_slot';
    
  • Tools: Datadog (wal_bytes), New Relic/Grafana (custom queries)
  • Best Practices: Retain 3–5 days WAL; alert on growth; drop inactive slots
  • Streamkap: Monitor via JMX; WAL issues from inactive slots common
  • Non-UTF8 unsupported
  • Data loss risk on upgrades (recreate slots)
  • Read replicas limit snapshots
  • No DDL events
  • PK changes need manual handling
  • TOASTed values may be incomplete
  • Consistency issues pre-commit
  • Generated columns missing in pgoutput
  • WAL buildup without heartbeats
  • Schema changes during snapshots unsupported
Use REPLICA IDENTITY FULL for complete records. Supports soft deletes. In upserts, deletes propagate as events.
  • WAL Buildup: Enable heartbeats; monitor/drop slots; retain 3–5 days
  • Upgrade Data Loss: Stop writes, capture events, recreate slot, resnapshot
  • Connection Failures: Verify pg_hba.conf, firewalls, SSL
  • Missing Events: Ensure publication includes tables; check REPLICA IDENTITY
  • Use dedicated replication user
  • Limit publications to needed tables
  • Enable auto-vacuum; set WAL retention
  • Test snapshots in staging
  • For cloud: Monitor provider tools; use heartbeats for low traffic
  • Use PG 17+ for failover resilience