MongoDB Source FAQ
MongoDB Sources FAQ for streamkap
This FAQ focuses on using MongoDB as a source in streamkap, including general self-hosted setups and cloud variants (MongoDB Atlas, Amazon DocumentDB). Streamkap's MongoDB connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.
What is a MongoDB source in streamkap?
A MongoDB source in streamkap enables real-time Change Data Capture (CDC) from MongoDB databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses change streams to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for nested data. Streamkap offers a serverless setup via UI or API.
What MongoDB versions are supported as sources?
- MongoDB 4.0+ for basic CDC; 6.0+ for advanced features like post-images and full document lookups. Compatible with MongoDB 3.6+ in some modes.
What MongoDB deployments are supported?
Streamkap supports:
- Self-hosted (on-prem/VM).
- MongoDB Atlas.
- Amazon DocumentDB (MongoDB-compatible).
- Streamkap also supports sharded clusters and replica sets, with automatic handling of shard additions/removals and membership changes.
What are the key features of MongoDB sources in streamkap?
- CDC: Change streams for inserts/updates/deletes; supports oplog for resume tracking.
- Snapshots: Ad-hoc/initial backfills using incremental or blocking methods; phased chunking for minimal impact.
- Schema Evolution: Automatic handling of document changes; field renaming/exclusion.
- Data Types: Supports integers, floats, strings, dates, arrays, objects, binary (configurable), JSON; extended JSON for identifiers.
- Ingestion Modes: Inserts (append) or upserts.
- Security: SSL, authentication, access control.
- Monitoring: Latency, lag, queue sizes in-app; heartbeat messages.
- Streamkap adds transaction metadata, filtering by collections/fields, and aggregation pipelines.
How does CDC work for MongoDB sources?
Streamkap uses MongoDB change streams to capture and decode oplog data, emitting changes as events. It starts from the last recorded transaction, performs a snapshot if needed, then streams from the oplog position. Supports full document updates with pre/post-images (MongoDB 6.0+).
How do snapshots work for MongoDB sources?
- Trigger ad-hoc at source/table level. Methods: Incremental (phased, chunked by ID, default 1024 documents) or blocking (stops streaming temporarily). Uses watermarking for progress; supports partial snapshots via conditions.
- Modes: initial (default), always, initial_only, no_data, when_needed, configuration_based, custom.
Streamkap simplifies triggering via UI.
What data types are supported?
- Basics: Integers (INT32/64), floats (FLOAT32/64), strings, dates/timestamps.
- Advanced: Arrays, objects (STRUCT/Tuple), binary (BYTES/base64/hex), decimals, JSON (STRING/io.debezium.data.Json).
- Identifiers: Integer, float, string, document, ObjectId, binary (extended JSON strict mode).
Unsupported: Inconsistent nested structures without preprocessing; non-UTF8; oversized BSON (strategies: fail/skip/split in 6.0.9+).
How to monitor for MongoDB sources?
Use queries for oplog size; tools like Datadog/New Relic for lag/queue metrics. Best Practices: Retain oplog 3–5 days; alert on growth.
What are common limitations?
- Standalone servers unsupported (convert to replica set); oplog purging during downtime may lose events; BSON size limits (fail/skip/split); no transactions in older versions.
- General: Sharded clusters need careful config; incremental snapshots require stable primary keys (non-strings recommended).
How to handle deletes?
Captures deletes as events with before images; supports full records with pre-images.
What security features are available?
Encrypted connections (SSL), keystore/truststore, authentication; role-based access.
Troubleshooting common issues
- Oplog Buildup: Monitor retention; resume from last position.
- Connection Failures: Verify firewall, SSL, authentication.
- Missing Events: Check include/exclude lists; resnapshot.
- Streamkap-Specific: Check logs for resume token issues.
Best practices for MongoDB sources
- Use replica sets (min 3 members for production).
- Enable pre/post-images for full updates.
- Limit collections to needed ones.
- Test snapshots in staging.
Updated about 8 hours ago