Skip to main content

Overview

Some Streamkap Sources (MySQL, MariaDB, Oracle, SQL Server, Db2) maintain a schema history topic to track Data Definition Language (DDL) changes over time. This is necessary because their transaction logs don’t include complete schema metadata—These Sources need to track schema history separately to properly decode change events.
PostgreSQL and MongoDB don’t use schema historyPostgreSQL’s logical decoding provides sufficient schema metadata in each change event, so no separate schema history tracking is needed. Similarly, MongoDB is schema-flexible and doesn’t require separate DDL tracking.
By default, the Source records schema structures from all databases and tables in your instance, even those you’re not capturing. For large database instances with many databases or tables, this can impact performance and potentially cause timeouts during startup or schema changes.

The Problem

When you have large database instances, schema history can cause several issues:
  • Slow Source startup and restarts - The Source must read and process the entire schema history topic on startup
  • Large topic sizes - Recording DDL for hundreds or thousands of uncaptured tables unnecessarily grows the topic
  • Timeout issues - Processing extensive schema history can cause timeouts

Optimization Settings

Best PracticeThese settings are optional optimizations. Only enable them if you experience performance issues. If you’re capturing most databases/tables in your instance, leaving these disabled (default) is fine.
Two optional settings help optimize schema history performance for large database instances:
When to enable: Your database instance contains many databases, but you’re only capturing a subset.What it does: Limits schema history recording to only the databases you’ve selected for capture, ignoring all others.Default: Disabled (records DDL from all databases)Example use case: You have a SQL Server instance with 50 databases but only capture 3 of them. Enable this to record schema history for only those 3 databases.
When to enable: Your captured databases contain many tables, but you’re only capturing specific tables.What it does: Limits schema history recording to only the tables you’ve selected for capture, ignoring all others in the captured databases.Default: Disabled (records DDL from all tables in captured databases)Example use case: Your database has 500 tables but you only capture 20 tables for your pipeline. Enable this to record schema history for only those 20 tables.
Additional optimization: Database user permissionsFor maximum performance, consider restricting the database user’s access to only the captured databases and tables by granting permissions explicitly. This prevents the Source from even seeing uncaptured objects, providing further optimization beyond these DDL filtering settings.

How to Configure

These settings are available in the Advanced section when creating or editing your Source:
  1. Navigate to your Source configuration
  2. Expand the Advanced section
  3. Find the Capture Only Captured Databases DDL setting
  4. Find the Capture Only Captured Tables DDL setting
  5. Enable either or both as needed

Important Considerations

What these settings do NOT affectThese settings only affect what gets recorded in the schema history topic. They do not change which tables are captured—that’s still controlled by your schema/table inclusion lists in the Source configuration.
When these settings are enabled:
  • Schema changes to captured databases/tables are still tracked normally
  • Schema changes to non-captured databases/tables are ignored. When a new table is included for capture, its schema will attempt to be recorded from that point onward
  • The Source continues to handle schema history for all captured objects

Backward Compatibility

If you enable these settings on an existing Source:
  • The Source will continue working normally
  • Future schema changes will follow the new filtering rules-see above
  • Historical schema history data remains unchanged in the topic