Skip to main content

Prerequisites

  • DocumentDB version ≥ 4.x
  • A database user with sufficient privileges to create database users and collections

DocumentDB Setup

1. Grant Database Access

2. Enable Change Streams

Change streams allow applications to access real-time data changes. The Connector relies on DocumentDB’s implementation of this.

Set Change Stream Log Retention Policy

Change stream logs should be retained for a minimum of 48 hours. We recommend 7 days.

3. Create Database User

It’s recommended to create a separate user for the Connector to access your DocumentDB database.
  • Using MongoDB Shell, connect to your primary node or replica set.
  • Create a user for Streamkap using the script below. Replace password with your choice.
use admin
db.createUser({
  user: "streamkap_user",
  pwd: "{password}",
  roles: [ "readAnyDatabase", {role: "read", db: "local"} ]
})

4. Enable Snapshots

To backfill your data, the Connector needs to be able to perform snapshots. See Snapshots & Backfilling for more information. You will need to create a streamkap_signal collection and give permissions to the streamkap_user. The Connector will use this collection for managing snapshots. This collection can exist in a different database (on the same DocumentDB instance) to the database Streamkap captures data from.
Please create the signal collection with the name streamkap_signal. It will not be recognised if given another name.
db.createCollection("streamkap_signal")

db.grantRolesToUser("streamkap_user", [
  { role: "read", db: "{database}" },
  { role: "readWrite", db: "{database}", collection: "streamkap_signal" }
])

Streamkap Setup

Follow these steps to configure your new connector:

1. Create the Source

2. Connection Settings

  • Name: Enter a name for your connector.
  • Connection String: The DocumentDB connection string.
  • Connection Mode (optional): Default is replica_set.
  • Array Encoding: Specify how Streamkap should encode DocumentDB array types. Array is the optimal method but requires all elements in the array to be of the same type. Document or String should be used if the DocumentDB arrays have mixed types.
  • Include Schema? (optional): If you plan on streaming data from this DocumentDB Source to Rockset, set this option to No.

3. Snapshot Settings

  • Signal Table Database: Streamkap will use a collection in this database to manage snapshots. See Enable Snapshots for more information.

4. Advanced Parameters

  • Represent binary data as: Specifies how the data for binary columns should be interpreted. Your destination for this data can impact which option you choose. Default is bytes.
Click Next.

5. Database and Collection Capture

  • Add Database/Collections: Specify the database(s) and collection(s) for capture.
    • You can bulk upload here. The format is a simple list of databases and collections, with each entry on a new row. Save as a .csv file without a header.
CDC only captures base collections, not ViewsChange Data Capture reads DocumentDB’s change streams, which only record changes to physical collections. Database Views are query-time aggregations with no physical storage—they don’t generate change stream events.What you cannot capture: Views (aggregation pipeline results), system collections (system., admin., config.*).Solution: Specify only the underlying base collections that feed your views. You can recreate the view aggregation pipeline in your destination or transformation layer.
Click Save.
Have questions? See the DocumentDB Source FAQ for answers to common questions about DocumentDB sources, troubleshooting, and best practices.