Skip to main content

MongoDB Sources FAQ for streamkap

This FAQ focuses on using MongoDB as a source in streamkap, including general self-hosted setups and cloud variants (MongoDB Atlas, Amazon DocumentDB). Streamkap’s MongoDB connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.
A MongoDB source in streamkap enables real-time Change Data Capture (CDC) from MongoDB databases, capturing row-level inserts, updates, and deletes with sub-second latency. It uses change streams to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for nested data. Streamkap offers a serverless setup via UI or API.
  • MongoDB 4.0+ for basic CDC; 6.0+ for advanced features like post-images and full document lookups.
  • Compatible with MongoDB 3.6+ in some modes.
Streamkap supports:
  • Self-hosted (on-prem/VM).
  • MongoDB Atlas.
  • Amazon DocumentDB (MongoDB-compatible).
  • Streamkap also supports sharded clusters and replica sets, with automatic handling of shard additions/removals and membership changes.
  • CDC: Change streams for inserts/updates/deletes; supports oplog for resume tracking.
  • Snapshots: Ad-hoc/initial backfills using incremental or blocking methods; phased chunking for minimal impact.
  • Schema Evolution: Automatic handling of document changes; field renaming/exclusion.
  • Data Types: Supports integers, floats, strings, dates, arrays, objects, binary (configurable), JSON; extended JSON for identifiers.
  • Ingestion Modes: Inserts (append) or upserts.
  • Security: SSL, authentication, access control.
  • Monitoring: Latency, lag, queue sizes in-app; heartbeat messages.
  • Streamkap adds transaction metadata, filtering by collections/fields, and aggregation pipelines.
Streamkap uses MongoDB change streams to capture and decode oplog data, emitting changes as events. It starts from the last recorded transaction, performs a snapshot if needed, then streams from the oplog position. Supports full document updates with pre/post-images (MongoDB 6.0+).
  • Trigger ad-hoc at source/table level.
    Methods: Incremental (phased, chunked by ID, default 1024 documents) or blocking (stops streaming temporarily).
    Uses watermarking for progress; supports partial snapshots via conditions.
  • Modes: initial (default), always, initial_only, no_data, when_needed, configuration_based, custom.
    Streamkap simplifies triggering via UI.
  • Basics: Integers (INT32/64), floats (FLOAT32/64), strings, dates/timestamps.
  • Advanced: Arrays, objects (STRUCT/Tuple), binary (BYTES/base64/hex), decimals, JSON (STRING/io.debezium.data.Json).
  • Identifiers: Integer, float, string, document, ObjectId, binary (extended JSON strict mode).
  • Unsupported: Inconsistent nested structures without preprocessing; non-UTF8; oversized BSON (strategies: fail/skip/split in 6.0.9+).
Use queries for oplog size; tools like Datadog/New Relic for lag/queue metrics.Best Practices: Retain oplog 3–5 days; alert on growth.
  • Standalone servers unsupported (convert to replica set)
  • Oplog purging during downtime may lose events
  • BSON size limits (fail/skip/split)
  • No transactions in older versions
  • General: Sharded clusters need careful config; incremental snapshots require stable primary keys (non-strings recommended)
Captures deletes as events with before images; supports full records with pre-images.
Encrypted connections (SSL), keystore/truststore, authentication; role-based access.
  • Oplog Buildup: Monitor retention; resume from last position.
  • Connection Failures: Verify firewall, SSL, authentication.
  • Missing Events: Check include/exclude lists; resnapshot.
  • Streamkap-Specific: Check logs for resume token issues.
  • Use replica sets (min 3 members for production)
  • Enable pre/post-images for full updates
  • Limit collections to needed ones
  • Test snapshots in staging