MongoDB (Generic)

Prerequisites

  • MongoDB version ≥ 4.4
  • MongoDB Replica Set or Sharded Cluster
  • Connection details
  • Streamkap user and role

Obtain Connection String

MongoDB Shell

  • Connect to your replica set or primary node using the MongoDB shell as an Admin user.
  • A valid connection string
    • Run db.getMongo() method to return your connection string
      • We recommend the connection string have the following parameters. They will be added automatically if they are not included:
        • retryWrites=true
        • retryReads=true
        • w=majority

Granting Privileges

MongoDB Shell

  • Using MongoDB Shell, connect to your primary node or replica set
  • Create a user for Streamkap. Replace password with your choice.
use admin  
   db.createUser({  
     user: "streamkap_user",  
     pwd: "<password>",  
     roles: [ "readAnyDatabase", {role: "read", db: "local"} ]  
   })

Enable Snapshots through MongoDB Shell

You will need to create a streamkap_signal collection and give permissions to the streamkap user/role. Streamkap will use this collection for managing snapshots.

This collection can exist in a different database (on the same MongoDB cluster) to the database Streamkap captures data from.

❗️

Please create the signal collection with the name streamkap_signal. It will not be recognised if given another name.

db.createCollection("streamkap_signal")

db.grantRolesToUser("streamkap_user", [
  { role: "read", db: "{database}" },
  { role: "readWrite", db: "{database}", collection: "streamkap_signal" }
])
--When later setting up the connector, you must include this collection

Consider Access Restrictions

Setup MongoDB Connector in Streamkap

  • Create a new MongoDB Source
  • Enter the following information:
    • Name for your Connector
    • Connection String- Copy the connection string from earlier steps but replace username and password in the string with the one you created earlier.
    • Connection Mode (Default replica_set): Specifies the strategy that the connector uses when it connects to a MongoDB cluster
    • Array Encoding: Specify how Streamkap should encode MongoDB array types. Array is the optimal method but requires all elements in the array to be of the same type. Document or String should be used if the MongoDB arrays have mixed types
    • Signal Table Database: Streamkap will use a collection in this database to manage snapshots e.g. public. See Enable Snapshots for more information
    • (Optional) Include Schema?: If you plan on streaming data from this Mongo Source to Rockset, set this option to No
    • Connect via SSH Tunnel. See SSH Tunnel
    • Advanced Parameters
      • Snapshot Mode (Default When Needed) See MongoDB Snapshot Modes for more information
      • Represent Binary Data As (Default bytes)
      • Snapshot Chunk Size (Default 1024) - This is the number of rows read at a time when snapshotting. This is a low safe value. As a guide, if you have 100m + rows of data you may want to move this to 5120. If you have 1bn then a higher number still will allow you to backfill faster.
      • Max Batch Size (Default 2048) - A value that specifies the maximum size of each batch of events that the connector processes. Only increase if experiencing lag
    • Add Schemas/Tables. Can also bulk upload here. The format is a simple list of each schema or table per row saved in csv format without a header.
    • Click Save
      The connector will take approximately 1 minute to start processing data.