> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# DocumentDB

> DocumentDB Change Data Capture Setup with Streamkap

## Prerequisites

* DocumentDB version ≥ 4.x
* A database user with sufficient privileges to create database users and collections

## DocumentDB Setup

### 1. Grant Database Access

* Configure one of the [Connection Options](/connection-options) to ensure Streamkap can reach your database.

### 2. Enable Change Streams

Change streams allow applications to access real-time data changes. The Connector relies on DocumentDB's implementation of this.

* [Enable/Configure Change Streams](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-enabling)

#### Set Change Stream Log Retention Policy

Change stream logs should be retained for a minimum of 48 hours. We recommend 7 days.

* [How to modify change stream log retention policy](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-modifying_log_retention)

### 3. Create Database User

It's recommended to create a separate user for the Connector to access your DocumentDB database.

* Using MongoDB Shell, connect to your primary node or replica set.
* Create a user for Streamkap using the script below. Replace password with your choice.

<CodeGroup>
  ```bash Shell theme={null}
  use admin
  db.createUser({
    user: "streamkap_user",
    pwd: "{password}",
    roles: [ "readAnyDatabase", {role: "read", db: "local"} ]
  })
  ```
</CodeGroup>

### 4. Enable Snapshots

To backfill your data, the Connector needs to be able to perform snapshots. See [Snapshots & Backfilling](/snapshots) for more information.

You will need to create a signal collection and give permissions to the `streamkap_user`. The Connector will use this collection for managing snapshots.

This collection can exist in a different database (on the same DocumentDB instance) to the database Streamkap captures data from.

<Info>
  The examples below use `streamkap_signal` as the signal collection name, but you can choose any name. During [Streamkap Setup](#3-snapshot-settings), provide the full path to your signal collection in `database.collection` format (e.g., `streamkap.streamkap_signal`).
</Info>

<CodeGroup>
  ```bash Shell theme={null}
  db.createCollection("streamkap_signal")

  db.grantRolesToUser("streamkap_user", [
    { role: "read", db: "{database}" },
    { role: "readWrite", db: "{database}", collection: "streamkap_signal" }
  ])
  ```
</CodeGroup>

***

## Streamkap Setup

Follow these steps to configure your new connector:

### 1. Create the Source

* Navigate to [Add Connectors](https://app.streamkap.com/connectors/add?tab=Sources).
* Choose **DocumentDB**.

### 2. Connection Settings

* **Name**: Enter a name for your connector.

* **Connection String**: The DocumentDB connection string.

* **Connection Mode** (optional): Default is `replica_set`.

* **Array Encoding**: Specify how Streamkap should encode DocumentDB array types. `Array` is the optimal method but requires all elements in the array to be of the same type. `Document` or `String` should be used if the DocumentDB arrays have mixed types.

* **Include Schema?** (optional): If you plan on streaming data from this DocumentDB Source to Rockset, set this option to **No**.

### 3. Snapshot Settings

* **Signal Collection**: Full path to the signal collection including database and collection name (e.g., `streamkap.streamkap_signal`). This collection is used for incremental snapshotting. See [Enable Snapshots](#4-enable-snapshots) for setup instructions.

### 4. Advanced Parameters

* **Represent binary data as**: Specifies how the data for binary columns should be interpreted. Your destination for this data can impact which option you choose. Default is `bytes`.

Click **Next**.

### 5. Database and Collection Capture

* **Add Database/Collections**: Specify the database(s) and collection(s) for capture.
  * You can bulk upload here. The format is a simple list of databases and collections, with each entry on a new row. Save as a `.csv` file without a header.

<Warning>
  **CDC only captures base collections, not Views**

  Change Data Capture reads DocumentDB's change streams, which only record changes to physical collections. Database Views are query-time aggregations with no physical storage—they don't generate change stream events.

  **What you cannot capture:** Views (aggregation pipeline results), system collections (system.*, admin.*, config.\*).

  **Solution:** Specify only the underlying base collections that feed your views. You can recreate the view aggregation pipeline in your destination or transformation layer.
</Warning>

Click **Save**.

<Info>
  **Have questions?** See the [DocumentDB Source FAQ](/documentdb-source-faq) for answers to common questions about DocumentDB sources, troubleshooting, and best practices.
</Info>