> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# DocumentDB Source FAQ

## Amazon DocumentDB Sources FAQ for streamkap

This FAQ focuses on using Amazon DocumentDB as a source in Streamkap, including general AWS-hosted setups (compatible with MongoDB). Streamkap's DocumentDB connector provides real-time CDC with managed features like automatic scaling, UI setup, and ETL transformations.

<AccordionGroup>
  <Accordion title="What is an Amazon DocumentDB source in Streamkap?">
    An Amazon DocumentDB source in streamkap enables real-time Change Data Capture (CDC) from DocumentDB databases, capturing document-level inserts, updates, and deletes with sub-second latency. It uses change streams (MongoDB-compatible) to stream changes to destinations, supporting snapshots for initial loads, schema evolution, and handling for nested JSON data. Streamkap offers a serverless setup via UI or API.
  </Accordion>

  <Accordion title="What Amazon DocumentDB versions are supported as sources?">
    * DocumentDB 4.0+ for basic CDC; 5.0+ for advanced features like enhanced change streams and array encoding options.
    * Compatible with MongoDB 3.6/4.0 compatibility modes.
  </Accordion>

  <Accordion title="What Amazon DocumentDB deployments are supported?">
    * AWS-hosted DocumentDB clusters (single instance or replica sets).
    * Streamkap handles sharded setups and multi-region replicas with automatic shard/membership tracking.
  </Accordion>

  <Accordion title="What are the key features of Amazon DocumentDB sources in Streamkap?">
    * **CDC**: Change streams for inserts/updates/deletes; oplog-based resume tracking.
    * **Snapshots**: Ad-hoc/initial backfills using incremental or blocking methods; phased chunking for minimal impact.
    * **Schema Evolution**: Automatic handling of document structure changes; field renaming/exclusion.
    * **Data Types**: Supports integers, floats, strings, dates, arrays, objects, binary (configurable as bytes/base64/hex), JSON; extended JSON for identifiers.
    * **Ingestion Modes**: Inserts (append) or upserts.
    * **Security**: SSL, IAM authentication, access control.
    * **Monitoring**: Latency, lag, queue sizes in-app; heartbeat messages.
    * Streamkap adds transaction metadata, filtering by collections, and aggregation pipelines.
  </Accordion>

  <Accordion title="How does CDC work for Amazon DocumentDB sources?">
    Streamkap uses DocumentDB change streams to capture and decode oplog data, emitting changes as events. It starts from the last recorded transaction, performs a snapshot if needed, then streams from the oplog position. Supports full document updates with pre/post-images (DocumentDB 5.0+).
  </Accordion>

  <Accordion title="How do snapshots work for Amazon DocumentDB sources?">
    * Trigger ad-hoc at source/collection level.\
      Methods: Incremental (phased, chunked by `_id`, default 1024 documents) or blocking (pauses streaming).\
      Uses watermarking for progress; supports partial snapshots via conditions.

    * Modes: `initial` (default), `always`, `initial_only`, `no_data`, `when_needed`, `configuration_based`, `custom`.
  </Accordion>

  <Accordion title="What data types are supported?">
    * **Basics**: Integers (INT32/64), floats (FLOAT32/64), strings, dates/timestamps.
    * **Advanced**: Arrays (configurable encoding: array, document, string), objects (STRUCT/Tuple), binary (BYTES/base64/hex), decimals, JSON (`STRING/io.debezium.data.Json`).
    * **Identifiers**: `_id` (ObjectId, string, etc.), binary (extended JSON strict mode).
    * **Unsupported**: Inconsistent nested structures without preprocessing; non-UTF8; oversized BSON (strategies: fail/skip/split).
  </Accordion>

  <Accordion title="How to set up a general Amazon DocumentDB source?">
    1. Ensure DocumentDB cluster is in active state with change streams enabled (default in 4.0+)
    2. Create IAM user with read permissions on cluster and `streamkap_signal` collection
    3. Create `streamkap_signal` collection for snapshots (can be in a different DB on same instance)
    4. In Streamkap UI: Add source, enter connection string (e.g., `mongodb://<user>:<pass>@<host>:27017/?ssl=true&replicaSet=rs0`), databases/collections, snapshot mode, array encoding
    5. Allow Streamkap IPs in VPC security group.
  </Accordion>

  <Accordion title="How to monitor for Amazon DocumentDB sources?">
    Use AWS CloudWatch for oplog size/lag; Streamkap app for queue metrics.

    **Best Practices**: Retain oplog 7 days (min 48 hours); alert on growth.
  </Accordion>

  <Accordion title="What are common limitations?">
    * Standalone instances unsupported (requires replica set)
    * Oplog purging during downtime may lose events
    * BSON size limits (fail/skip/split)
    * No transactions pre-4.0
    * Sharded clusters need config server access
    * Incremental snapshots require stable `_id` (non-strings preferred)
    * UTF-8 only
  </Accordion>

  <Accordion title="How to handle deletes?">
    Captures deletes as events with before images; supports full records with pre-images (5.0+).
  </Accordion>

  <Accordion title="What security features are available?">
    Encrypted connections (SSL), IAM authentication, role-based access; VPC security groups.
  </Accordion>

  <Accordion title="Troubleshooting common issues">
    * **Oplog Buildup**: Monitor retention (AWS Console); resume from last position
    * **Connection Failures**: Verify VPC, SSL, IAM roles
    * **Missing Events**: Check include/exclude lists; ensure change streams enabled
    * **Streamkap-Specific**: Check logs for resume token issues; validate signal collection
  </Accordion>

  <Accordion title="Can CDC capture database Views and other virtual objects?">
    **No, CDC cannot capture Views or most virtual database objects.**

    **Why Views cannot be captured:**\
    CDC captures changes by reading the database transaction log (binlog, WAL, oplog, redo log, etc.). Views are query-time computations over base tables—they don't store data or generate transaction log entries. When you query a view, the database engine executes the underlying SELECT statement against the base tables. Since views don't store data, they don't generate transaction log entries.

    **What cannot be captured:**

    * **Views**: Virtual collections defined by aggregation pipelines, no physical storage or oplog entries
    * **System Collections** (system.\*, admin.\*, config.\*): Metadata and internal state, not user data
    * **Time Series Collections**: Amazon DocumentDB does not natively support MongoDB 5.0+ time series collections. However, if using DocumentDB 5.0-compatible mode or custom implementations of time-stamped data with specialized storage, change streams may be limited or unavailable due to storage optimizations that don't maintain document-level change granularity in the oplog. **Solution**: Use regular collections with appropriate time-based indexes for CDC on time-stamped data.
    * **On-Demand Materialized Views** (`$merge`, `$out` results): Generated data, not original sources

    **Solution:**\
    Configure CDC on the underlying base tables that power your views. The view logic can be recreated in your destination or transformation layer.

    **DocumentDB-specific notes:**

    * **Aggregation pipelines on views**: Capture the source collections and apply the pipeline logic downstream
    * **Standalone instances**: Not supported for CDC—must use a replica set configuration

    **Example:**\
    If you have a view `order_summary` created from the `orders` collection with filters and projections, capture the `orders` collection instead, then apply the same aggregation logic in your destination.
  </Accordion>

  <Accordion title="Best practices for Amazon DocumentDB sources">
    * Use replica sets (min 3 nodes for production)
    * Enable pre/post-images for full updates (5.0+)
    * Limit collections to reduce load
    * Test snapshots in staging
    * Monitor via CloudWatch; set 7-day oplog retention
  </Accordion>
</AccordionGroup>
