> ## Documentation Index > Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt > Use this file to discover all available pages before exploring further. # PostgreSQL Source FAQ > Answers to common PostgreSQL source connector questions in Streamkap, covering replication slots, WAL retention, publications, generated columns, and more. ## PostgreSQL Sources FAQ for Streamkap This FAQ focuses on using PostgreSQL as a source in Streamkap, including general self-hosted setups and cloud variants (AWS RDS/Aurora, Azure, Google Cloud SQL, Neon). Streamkap's PostgreSQL connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations. A PostgreSQL source in Streamkap enables real-time Change Data Capture (CDC) from PostgreSQL databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses logical replication to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic DBs. Streamkap abstracts complexity, offering a serverless setup via UI or API. * PostgreSQL 10+ for basic CDC; 13+ for advanced features like read-only snapshots and partitioned tables with `publish_via_partition_root`. Compatible with PostgreSQL 15–17+ in cloud deployments. * PostgreSQL 9.4+ for logical decoding; 10+ for `pgoutput` plugin; 13+ for read-only incremental snapshots; 16+ for replica server slots; 17+ for failover-enabled slots. Streamkap supports: * Self-hosted (on-prem/VM) * AWS RDS PostgreSQL (including Aurora and Serverless) * Azure Database for PostgreSQL * Google Cloud SQL for PostgreSQL * Neon PostgreSQL Streamkap also supports these, plus CrunchyBridge and Amazon RDS, with similar configs. * **CDC**: Log-based via `pgoutput` decoder; captures inserts/updates/deletes * **Snapshots**: Ad-hoc/initial backfills using read-only (PG 13+) or signal table methods; watermarking for minimal impact * **Schema Evolution**: Automatic handling of adds/drops/type changes * **Heartbeats**: Enabled by default to prevent WAL buildup in low-traffic DBs via a `streamkap_heartbeat` table * **Data Types**: Standard mappings (numerics, temporals, strings, binary as bytes/base64/hex, arrays/JSON/hstore); supports pgvector/VECTOR types in latest Streamkap * **Ingestion Modes**: Inserts (append) or upserts * **Security**: SSL, SSH/VPN, IP allowlisting * **Monitoring**: Latency, lag, WAL metrics in-app * Streamkap adds transaction metadata, ad-hoc snapshots with filters, and custom types like PostGIS/GEOMETRY Streamkap reads PostgreSQL's write-ahead log (WAL) via logical decoding, emitting changes as events. It uses Streamkap's `pgoutput` plugin for native support. REPLICA IDENTITY FULL ensures complete before/after images for updates/deletes. REPLICA IDENTITY controls logged data for changes: * **DEFAULT**: Logs PK only (partial deletes) * **FULL**: Logs all columns (complete records, recommended for audits/soft deletes). Set via `ALTER TABLE ... REPLICA IDENTITY FULL;`. Required for full delete handling. * Trigger ad-hoc at source/table level.\ Methods: Read-only (PG 13+, no signal table) or `streamkap_signal` table. Uses watermarking for incremental loads. * Modes like `initial`, `always`, `when_needed`; supports incremental (chunked, parallel) and read-only (PG 13+); ad-hoc via signaling. Streamkap simplifies triggering via UI. Connectors use "offsets"—like bookmarks—to track their position in the database's log or change stream. When no changes occur for long periods, these offsets may become outdated, and the Connector might lose its place or stop capturing changes. Heartbeats ensure the Connector stays active and continues capturing changes. There are two layers of heartbeat protection: #### Layer 1: Connector heartbeats (enabled by default) The Connector periodically emits heartbeat messages to an internal topic, even when no actual data changes are detected. This keeps offsets fresh and prevents staleness. No configuration is necessary for this layer; it is automatically enabled. We recommend keeping this layer enabled for all deployments. #### Layer 2: Source database heartbeats (recommended) **Why we recommend configuring Layer 2** While Layer 2 is **crucial** for low-traffic or intermittent databases, we recommend configuring it for all deployments. It provides additional resilience and helps prevent issues during periods of inactivity. You can configure regular updates to a dedicated heartbeat table in the source database. This simulates activity, ensuring change events are generated consistently, maintaining log progress and providing additional resilience. How this layer is configured depends on the connection type (if supported by the Source): * **Read-write connections** (when **Read only** is **No** during Streamkap Setup): The Connector updates the heartbeat table directly. * **Read-only connections** (when **Read only** is **Yes** during Streamkap Setup): A scheduled job on the **primary** database updates the heartbeat table, and these changes replicate to the read replica for the Connector to consume. This layer requires you to set up a heartbeat table—and for read-only connections, a scheduled job (e.g., `pg_cron` for PostgreSQL, `event_scheduler` for MySQL)—on your source database. For read-write connections (when **Read only** is **No** during Streamkap Setup), the Connector writes to the heartbeat table directly. ```SQL SQL theme={null} theme={null} theme={null} theme={null} theme={null} theme={null} -- Create the streamkap schema CREATE SCHEMA IF NOT EXISTS streamkap; -- Switch to the streamkap schema SET search_path TO streamkap; -- Create the heartbeat table with id, text, and last_update fields CREATE TABLE streamkap_heartbeat ( id SERIAL PRIMARY KEY, text TEXT, last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Grant permission to the Streamkap user GRANT USAGE ON SCHEMA streamkap TO streamkap_user; GRANT SELECT, UPDATE, INSERT, DELETE ON TABLE streamkap_heartbeat TO streamkap_user; -- Insert the first row into the heartbeat table INSERT INTO streamkap_heartbeat (text) VALUES ('test_heartbeat'); ``` **Heartbeat tables and PostgreSQL publications** If the `streamkap_pub` publication created during PostgreSQL Setup was for specific tables e.g. `CREATE PUBLICATION streamkap_pub FOR TABLE table1, table2, table3, ...;` instead of `FOR ALL TABLES;`, you **must** add the heartbeat table to the publication: `ALTER PUBLICATION streamkap_pub ADD TABLE streamkap.streamkap_heartbeat;`. * **Basics**: Integers, floats, strings, dates/timestamps (micro/nano precision modes) * **Advanced**: Arrays, JSON, binary (bytes/hex), decimals (precise/double/string modes) * **Custom**: Domain types, network addresses, PostGIS (GEOMETRY/GEOGRAPHY), pgvector (VECTOR/HALFVEC/SPARSEVEC) * **Unsupported**: Non-UTF8 encodings; some spatial/custom without config * Use queries like: ```sql theme={null} SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) FROM pg_replication_slots WHERE slot_name = 'streamkap_slot'; ``` * Tools: Datadog (`wal_bytes`), New Relic/Grafana (custom queries) * Best Practices: Retain 3–5 days WAL; alert on growth; drop inactive slots * Streamkap: Monitor via JMX; WAL issues from inactive slots common * Non-UTF8 unsupported * Data loss risk on upgrades (recreate slots) * Read replicas limit snapshots * No DDL events * PK changes need manual handling * TOASTed values may be incomplete * Consistency issues pre-commit * Generated columns not streamed — exclude them from capture (incremental snapshots fail otherwise) * WAL buildup without heartbeats * Schema changes during snapshots unsupported Use REPLICA IDENTITY FULL for complete records. Supports soft deletes. In upserts, deletes propagate as events. * **WAL Buildup**: Enable heartbeats; monitor/drop slots; retain 3–5 days * **Upgrade Data Loss**: Stop writes, capture events, recreate slot, resnapshot * **Connection Failures**: Verify `pg_hba.conf`, firewalls, SSL * **Missing Events**: Ensure publication includes tables; check REPLICA IDENTITY **No, CDC cannot capture Views or most virtual database objects.** **Why Views cannot be captured:**\ CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don't store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don't store data, they don't generate transaction log entries. **What cannot be captured:** * **Views**: Virtual tables with no physical storage or WAL entries * **Materialized Views**: Special case—can be captured if they have `REPLICA IDENTITY` configured, but refresh operations may not generate standard change events. Better to capture the source tables. * **Temporary Tables**: Session-scoped, not logged persistently in WAL * **Unlogged Tables**: Explicitly excluded from WAL by design * **Foreign Tables**: Reference external data sources, not local storage * **System/Catalog Tables** (information\_schema, pg\_catalog): Metadata representations, not user data * **CTEs (Common Table Expressions)**: Query-time constructs with no persistent storage **Solution:**\ Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer. **Example:**\ If you have a view `sales_summary` that queries tables `orders` and `customers`, capture the `orders` and `customers` tables instead, then recreate the view logic downstream. PostgreSQL stored generated columns (`GENERATED ALWAYS AS (...) STORED`) are **not** included in the replication stream, so streaming cannot keep them up to date. We recommend **excluding generated columns from capture** — in the Source's **Settings → Advanced** section, set **Column Selection Mode** to **Exclusion** and add the column to the **Column Exclusion List** — and recomputing the value at your destination or in a [transform](/transforms) if you need it. An incremental snapshot (the **Filtered** and **Full** options) of a table with a generated column will **fail** unless the column is excluded. **Blocking** and **Parallel** snapshots are not affected. PostgreSQL 18 and later can optionally stream stored generated columns when the publication is created with `publish_generated_columns = stored`; earlier versions cannot. PostgreSQL rejects null bytes (`0x00`) in text columns. If your source data contains null characters, they can cause errors during replication. **Resolution:** Clean null characters from affected columns at the source: ```sql theme={null} UPDATE my_table SET field = REPLACE(field, chr(0), '') WHERE position(chr(0) in field) > 0; ``` Alternatively, add a transform to strip null bytes before delivery to the destination. Snapshots acquire a brief lock during initialization. For very large tables under heavy concurrent write load, the lock acquisition may time out (default 15 minutes). **Resolution:** 1. Schedule snapshots during off-peak hours to reduce lock contention 2. Use [Filtered (Partial) snapshots](/snapshots#snapshot-options) to process smaller data ranges 3. If timeouts persist, contact [Streamkap support](mailto:support@streamkap.com) for assistance * Use dedicated replication user * Limit publications to needed tables * Enable auto-vacuum; set WAL retention * Test snapshots in staging * For cloud: Monitor provider tools; use heartbeats for low traffic * Use PG 17+ for failover resilience Database version upgrades require careful planning to avoid data loss and minimize downtime for your CDC pipelines. See our [Database Upgrade Guide](/database-upgrade-guide) for step-by-step instructions.