Rockset

Stream Change Data Capture (CDC) data into Rockset

Prerequisites

  • A Rockset Medium virtual instance or larger
  • A Rockset account granted at least member built-in role or a custom role that includes the following privileges:
    • CREATE_COLLECTION_INTEGRATION
    • CREATE_INTEGRATION_GLOBAL
    • UPDATE_VI

Limitations

  • You will need to switch between Rockset and Streamkap during setup
  • Existing Streamkap MongoDB and DocumentDB Sources are incompatible with Rockset Destinations. You will need to create new MongoDB and DocumentDB Sources and set:
    • Include Schema? option to No
  • Rockset doesn't automatically create Collections from it's Integrations. You will need to manually create Collections for each Source table (MySQL, PostgreSQL, SQL Server, etc) or collection (Mongo, DocumentDB)
  • Rockset only captures data from the point in time that it was set up. After the Streamkap Pipeline is created and Rockset Integration Setup is completed, you can trigger a snapshot to backfill the historic data

Setup

Create a Rockset Integration

  1. Add a new Rockset Kafka Integration and click Start
  2. Give the Integration a memorable Integration Name and Description (optional)
  3. For where your Kafka cluster is hosted, choose Apache Kafka

🚧

Before you continue

Make sure that for the Source who's data you want to stream to Rockset you have the:

  • Streamkap Source ID
  • Streamkap Topic name(s) for the tables (MySQL, PostgreSQL, SQL Server, etc) or collections (Mongo, DocumentDB)

Together with the source_ prefix, they make up the full name of the topics to be streamed to Rockset e.g. source_abcdefg123456.mySchema.myTable

If you're not sure how to get these, please see the Troubleshooting section

  1. Choose Data Format based on the Source type:
    1. JSON: if your Source is MongoDB or DocumentDB
    2. AVRO: for all other Sources
  2. For Kafka Topics, enter the full name - including the Source's Streamkap ID - of each table or collection e.g. if your Source's ID is abcdefg123456 and the Streamkap Topic's name is mySchema.myTable, the full name would be source_abcdefg123456.mySchema.myTable
  3. After you have entered the name(s) of each table or collection, click Save Integration and Continue
  4. For what type of setup your Kafka Connect cluster is, choose Distributed
  5. Under Step 4: Configure the Rockset Sink Connector you will see a JSON configuration, copy and paste that somewhere. You will need the values of these properties for creating a Streamkap Pipeline:
    1. format
    2. rockset.apiserver.url
    3. rockset.integration.key

🚧

Don't close this Rockset Integration Setup page. You will be coming back to it later on.

Create a Streamkap Pipeline

  1. Add a new Streamkap Rockset Destination and enter the following information:
    1. Name - A memorable name for this Connector
    2. Rockset API URL - The rockset.apiserver.url property value from step 9
    3. Rockset Integration Key - The rockset.integration.key property value from step 9
    4. Format - The format property value from step 9
  2. After you have entered the information, click Save
  3. Add a new Streamkap Pipeline, giving it a memorable name and selecting the Source, and the newly created Rockset Destination
  4. Click Next
  5. Choose the schema(s) and table(s) you want to stream to Rockset, then click Save

Complete Rockset Integration Setup

  1. Go back to the new Rockset Kafka Integration you are creating
  2. Under Step 5: Check if the data is coming through click Refresh and wait a few moments. Then, verify that for each topic listed, the status is Active
  3. If the statuses are Active, click Complete Integration Setup
  4. In the top right corner, click Create Collection from Integration, then:
    1. Enter the full name of the of the table or collection from this Rockset Kafka Integration and click Next
    2. (Optional) If you need to transform the data prior to ingestion into a Rockset collection, enter your transformation SQL query, otherwise, click Next
    3. Choose the Workspace to create the Rockset collection in
    4. Enter a Collection Name and Description (optional) for the Rockset collection
    5. (Optional): Change the Ingest Limit, Retention Policy and Data Compression

      🚧

      If a box appears with the option Yes, switch to Medium virtual instance, tick that box because the Rockset Connector utilises bulk ingest which is not supported on Shared instances.

  5. After you have entered the information, click Create

You will need to repeat steps 4 and 5 for each table or collection from this Rockset Kafka Integration.

Troubleshooting

Get the Streamkap Source ID

  1. Go to the Sources page and click on the Source name to view its details
  2. In the URL e.g. app.streamkap.com/sources/abcdefg123456, you will see the Source's ID e.g. abcdefg123456

Get the Streamkap Topic names

  1. Go to the Sources page and click on the Source name to view its details
  2. At the bottom of the page the topic names are listed e.g. mySchema.myTable

The source_ prefix, Source ID and Topic name combined make up the full name required by the Rockset Kafka Integration e.g. source_abcdefg123456.mySchema.myTable