SQL Server Sources FAQ for Streamkap
This FAQ focuses on using SQL Server as a source in Streamkap, including self-hosted setups and cloud variants (AWS RDS SQL Server, Azure SQL Database, Google Cloud SQL for SQL Server). Streamkap’s SQL Server connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.What is a SQL Server source in Streamkap?
What is a SQL Server source in Streamkap?
A SQL Server source in Streamkap enables real-time Change Data Capture (CDC) from SQL Server databases, capturing row-level inserts, updates, and deletes with sub-second latency. It reads CDC change tables to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and heartbeats for low-traffic databases. Streamkap abstracts complexity, offering a serverless setup via UI or API.
What SQL Server versions are supported as sources?
What SQL Server versions are supported as sources?
- SQL Server 2016+ for basic CDC; 2017+ for enhanced features and Linux support; 2019+ for advanced data types and performance.
- Compatible with SQL Server 2014+ in limited modes.
- Azure SQL Database (all service tiers with CDC support)
What SQL Server deployments are supported?
What SQL Server deployments are supported?
Streamkap supports:
- Self-hosted (on-prem/VM)
- AWS RDS SQL Server (Standard, Enterprise, Web editions)
- Azure SQL Database (including Managed Instance)
- Google Cloud SQL for SQL Server
What are the key features of SQL Server sources in Streamkap?
What are the key features of SQL Server sources in Streamkap?
- CDC: Change table-based via SQL Server CDC; captures inserts/updates/deletes
- Snapshots: Ad-hoc/initial backfills using signal table methods; locking or non-locking modes
- Schema Evolution: Requires manual refresh of CDC change tables when source table structure changes
- Heartbeats: Enabled by default to prevent offset staleness in low-traffic databases
- Data Types: Standard mappings (numerics, temporals, strings, binary as bytes/base64/hex, XML, hierarchyid)
- Ingestion Modes: Inserts (append) or upserts
- Security: SSL, Windows/SQL authentication, IP allowlisting
- Monitoring: Latency, lag, CDC cleanup metrics in-app
- Streamkap adds transaction metadata and ad-hoc snapshots with filters
How does CDC work for SQL Server sources?
How does CDC work for SQL Server sources?
Streamkap reads SQL Server’s CDC change tables, which are automatically populated by the database engine when CDC is enabled on tables. The SQL Server Agent must be running to process CDC data. Change events are emitted as they’re read from these change tables.
What is CDC cleanup and why does it matter?
What is CDC cleanup and why does it matter?
SQL Server CDC generates change tables that grow over time. The SQL Server Agent runs cleanup jobs to purge old data based on retention settings.
- Default retention: 3 days
- Monitor cleanup: Ensure SQL Server Agent is running and cleanup jobs are scheduled
- Storage impact: Change tables can consume significant disk space if cleanup fails
How do snapshots work for SQL Server sources?
How do snapshots work for SQL Server sources?
- Trigger ad-hoc at source/table level
- Requires a signal table (
streamkap_signal) with CDC enabled - Methods: Blocking (locks tables briefly) or incremental (chunked reads)
- Modes:
initial(default),always,initial_only,no_data,when_needed,configuration_based,custom
What are heartbeats and how do they work?
What are heartbeats and how do they work?
Heartbeats prevent offset staleness in low-traffic databases by periodically updating a heartbeat table, generating change events to keep the connector active.
- Setup: Heartbeat functionality is built into Streamkap’s connector
- How it Works: Connector updates the heartbeat table periodically
- Verification: Check connector metrics for heartbeat activity
- Limitations: Requires heartbeat table to be created and CDC-enabled
heartbeat.interval.ms for heartbeat frequency.What data types are supported?
What data types are supported?
- Basics: Integers (TINYINT, SMALLINT, INT, BIGINT), floats (REAL, FLOAT), strings (CHAR, VARCHAR, NCHAR, NVARCHAR, TEXT), dates/timestamps (DATE, DATETIME, DATETIME2, DATETIMEOFFSET, TIME)
- Advanced: Binary (BINARY, VARBINARY, IMAGE as bytes/hex), decimals (DECIMAL, NUMERIC, MONEY), XML, uniqueidentifier (GUID)
- Custom: Hierarchyid, spatial types (GEOMETRY, GEOGRAPHY) with limitations
- Unsupported: Non-UTF8/UTF16 encodings, deprecated types (TEXT, NTEXT, IMAGE in older modes), sql_variant without explicit casting
How to handle table schema changes?
How to handle table schema changes?
When you modify a table structure (add/remove columns, change data types), the CDC change table does not automatically update.You must manually refresh the change table:
- Create a new capture instance using
sys.sp_cdc_enable_tablewith a new@capture_instancename - Wait for Streamkap to switch to the new capture instance
- Disable the old capture instance using
sys.sp_cdc_disable_table
How to monitor CDC for SQL Server sources?
How to monitor CDC for SQL Server sources?
- Check SQL Server Agent status:
EXEC master.dbo.xp_servicecontrol N'QUERYSTATE', N'SQLSERVERAGENT' - Monitor change table sizes:
SELECT * FROM sys.dm_cdc_log_scan_sessions - Check cleanup job status:
SELECT * FROM msdb.dbo.cdc_jobs - Monitor lag via Streamkap in-app metrics
Can CDC capture database Views, Materialized Views, and other virtual objects?
Can CDC capture database Views, Materialized Views, and other virtual objects?
No, CDC cannot capture Views or most virtual database objects.Why Views cannot be captured:
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.SQL Server-specific notes:
If you have a view
CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don’t store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don’t store data, they don’t generate transaction log entries.What cannot be captured:
- Views: Virtual tables with no physical storage or change tables
- Indexed Views (Materialized Views): While they have physical storage, they don’t generate CDC events. Capture the base tables instead.
- Table Variables: Session-scoped, not logged in transactions
- Temporary Tables (#temp, ##temp): Session or connection-scoped, excluded from CDC
- Memory-Optimized Tables (In-Memory OLTP): Extremely limited CDC support. While technically possible with SQL Server 2016 SP2+ (compatibility level 130+), memory-optimized tables use In-Memory OLTP engine which bypasses traditional transaction logging. CDC can be enabled but with significant performance penalties that negate the benefits of memory optimization. Not recommended for CDC; capture from traditional disk-based tables instead or use alternative change tracking approaches.
- System Tables (sys.*, information_schema): Metadata representations, not user data
- CTEs (Common Table Expressions): Query-time constructs with no persistent storage
- External Tables: Reference external data sources (PolyBase), not local storage
Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.SQL Server-specific notes:
- Indexed Views: Even though they store data physically, changes to indexed views are captured through the base table CDC, not the view itself
- Partitioned Views: Capture each underlying base table separately
- Memory-Optimized Tables (In-Memory OLTP): While CDC can technically be enabled on memory-optimized tables in SQL Server 2016 SP2+ with database compatibility level 130+, doing so introduces significant performance overhead. Memory-optimized tables use In-Memory OLTP engine designed for lock-free, log-optimized operations. Enabling CDC forces traditional transaction logging which defeats the performance benefits. Recommendation: Use traditional disk-based tables for CDC capture, or consider Change Tracking (lighter weight alternative) or Temporal Tables for memory-optimized scenarios.
If you have a view
vw_sales_summary that queries tables Orders and Customers, enable CDC on the Orders and Customers tables, then recreate the view logic downstream.What are common limitations?
What are common limitations?
- SQL Server Agent must be running for CDC to function
- Maximum 2 capture instances per table (impacts schema evolution)
- Change table cleanup requires active SQL Server Agent jobs
- High transaction volume can cause change table growth
- Schema changes require manual intervention
- Read replicas may have limitations for CDC
- Restores disable CDC (must re-enable after restore)
How to handle deletes?
How to handle deletes?
Captures deletes as events. SQL Server CDC records before-images for deletes by default, providing complete record information.
What security features are available?
What security features are available?
Encrypted connections (SSL/TLS), SQL Server or Windows authentication, role-based access, IP allowlisting.
Troubleshooting common issues
Troubleshooting common issues
- CDC Not Working: Verify SQL Server Agent is running; check CDC is enabled on database and tables
- Change Table Growth: Monitor cleanup job execution; adjust retention settings
- Missing Events: Ensure capture instance is active; verify role membership for CDC access
- After Database Restore: CDC is disabled; must re-enable on database and tables
- Schema Changes Not Reflected: Must manually refresh change tables (see setup docs)
Best practices for SQL Server sources
Best practices for SQL Server sources
- Use dedicated CDC user with minimal privileges
- Ensure SQL Server Agent is always running and monitored
- Set appropriate CDC cleanup retention (default 3 days)
- Limit capture to needed tables only to reduce change table size
- Test schema evolution procedures in staging
- Monitor disk space for change tables
- For cloud: Use provider monitoring; enable heartbeats for low traffic
- Document schema change procedures for your team