Overview
Some Streamkap Sources (MySQL, MariaDB, Oracle, SQL Server, Db2) maintain a schema history topic to track Data Definition Language (DDL) changes over time. This is necessary because their transaction logs don’t include complete schema metadata—These Sources need to track schema history separately to properly decode change events.PostgreSQL and MongoDB don’t use schema historyPostgreSQL’s logical decoding provides sufficient schema metadata in each change event, so no separate schema history tracking is needed. Similarly, MongoDB is schema-flexible and doesn’t require separate DDL tracking.
The Problem
When you have large database instances, schema history can cause several issues:- Slow Source startup and restarts - The Source must read and process the entire schema history topic on startup
- Large topic sizes - Recording DDL for hundreds or thousands of uncaptured tables unnecessarily grows the topic
- Timeout issues - Processing extensive schema history can cause timeouts
Optimization Settings
Two optional settings help optimize schema history performance for large database instances:Capture Only Captured Databases DDL
Capture Only Captured Databases DDL
When to enable: Your database instance contains many databases, but you’re only capturing a subset.What it does: Limits schema history recording to only the databases you’ve selected for capture, ignoring all others.Default: Disabled (records DDL from all databases)Example use case: You have a SQL Server instance with 50 databases but only capture 3 of them. Enable this to record schema history for only those 3 databases.
Capture Only Captured Tables DDL
Capture Only Captured Tables DDL
When to enable: Your captured databases contain many tables, but you’re only capturing specific tables.What it does: Limits schema history recording to only the tables you’ve selected for capture, ignoring all others in the captured databases.Default: Disabled (records DDL from all tables in captured databases)Example use case: Your database has 500 tables but you only capture 20 tables for your pipeline. Enable this to record schema history for only those 20 tables.
How to Configure
These settings are available in the Advanced section when creating or editing your Source:- Navigate to your Source configuration
- Expand the Advanced section
- Find the Capture Only Captured Databases DDL setting
- Find the Capture Only Captured Tables DDL setting
- Enable either or both as needed
Important Considerations
What these settings do NOT affectThese settings only affect what gets recorded in the schema history topic. They do not change which tables are captured—that’s still controlled by your schema/table inclusion lists in the Source configuration.
- Schema changes to captured databases/tables are still tracked normally
- Schema changes to non-captured databases/tables are ignored. When a new table is included for capture, its schema will attempt to be recorded from that point onward
- The Source continues to handle schema history for all captured objects
Backward Compatibility
If you enable these settings on an existing Source:- The Source will continue working normally
- Future schema changes will follow the new filtering rules-see above
- Historical schema history data remains unchanged in the topic