Rockset
Stream Change Data Capture (CDC) data into Rockset
Prerequisites
- A Rockset Medium virtual instance or larger
- A Rockset account granted at least
member
built-in role or a custom role that includes the following privileges:CREATE_COLLECTION_INTEGRATION
CREATE_INTEGRATION_GLOBAL
UPDATE_VI
Limitations
- You will need to switch between Rockset and Streamkap during setup
- Existing Streamkap MongoDB and DocumentDB Sources are incompatible with Rockset Destinations. You will need to create new MongoDB and DocumentDB Sources and set:
- Include Schema? option to No
- Rockset doesn't automatically create Collections from it's Integrations. You will need to manually create Collections for each Source table (MySQL, PostgreSQL, SQL Server, etc) or collection (Mongo, DocumentDB)
- Rockset only captures data from the point in time that it was set up. After the Streamkap Pipeline is created and Rockset Integration Setup is completed, you can trigger a snapshot to backfill the historic data
Setup
Create a Rockset Integration
- Add a new Rockset Kafka Integration and click Start
- Give the Integration a memorable Integration Name and Description (optional)
- For where your Kafka cluster is hosted, choose Apache Kafka
Before you continue
Make sure that for the Source who's data you want to stream to Rockset you have the:
- Streamkap Source ID
- Streamkap Topic name(s) for the tables (MySQL, PostgreSQL, SQL Server, etc) or collections (Mongo, DocumentDB)
Together with the
source_
prefix, they make up the full name of the topics to be streamed to Rockset e.g.source_abcdefg123456.mySchema.myTable
If you're not sure how to get these, please see the Troubleshooting section
- Choose Data Format based on the Source type:
JSON
: if your Source is MongoDB or DocumentDBAVRO
: for all other Sources
- For Kafka Topics, enter the full name - including the Source's Streamkap ID - of each table or collection e.g. if your Source's ID is
abcdefg123456
and the Streamkap Topic's name ismySchema.myTable
, the full name would besource_abcdefg123456.mySchema.myTable
- After you have entered the name(s) of each table or collection, click Save Integration and Continue
- For what type of setup your Kafka Connect cluster is, choose Distributed
- Under Step 4: Configure the Rockset Sink Connector you will see a JSON configuration, copy and paste that somewhere. You will need the values of these properties for creating a Streamkap Pipeline:
format
rockset.apiserver.url
rockset.integration.key
Don't close this Rockset Integration Setup page. You will be coming back to it later on.
Create a Streamkap Pipeline
- Add a new Streamkap Rockset Destination and enter the following information:
- Name - A memorable name for this Connector
- Rockset API URL - The
rockset.apiserver.url
property value from step 9 - Rockset Integration Key - The
rockset.integration.key
property value from step 9 - Format - The
format
property value from step 9
- After you have entered the information, click Save
- Add a new Streamkap Pipeline, giving it a memorable name and selecting the Source, and the newly created Rockset Destination
- Click Next
- Choose the schema(s) and table(s) you want to stream to Rockset, then click Save
Complete Rockset Integration Setup
- Go back to the new Rockset Kafka Integration you are creating
- Under Step 5: Check if the data is coming through click Refresh and wait a few moments. Then, verify that for each topic listed, the status is Active
- If the statuses are Active, click Complete Integration Setup
- In the top right corner, click Create Collection from Integration, then:
- Enter the full name of the of the table or collection from this Rockset Kafka Integration and click Next
- (Optional) If you need to transform the data prior to ingestion into a Rockset collection, enter your transformation SQL query, otherwise, click Next
- Choose the Workspace to create the Rockset collection in
- Enter a Collection Name and Description (optional) for the Rockset collection
- (Optional): Change the Ingest Limit, Retention Policy and Data Compression
If a box appears with the option Yes, switch to Medium virtual instance, tick that box because the Rockset Connector utilises bulk ingest which is not supported on Shared instances.
- After you have entered the information, click Create
You will need to repeat steps 4 and 5 for each table or collection from this Rockset Kafka Integration.
Troubleshooting
Get the Streamkap Source ID
- Go to the Sources page and click on the Source name to view its details
- In the URL e.g.
app.streamkap.com/sources/abcdefg123456
, you will see the Source's ID e.g.abcdefg123456
Get the Streamkap Topic names
- Go to the Sources page and click on the Source name to view its details
- At the bottom of the page the topic names are listed e.g.
mySchema.myTable
The source_
prefix, Source ID and Topic name combined make up the full name required by the Rockset Kafka Integration e.g. source_abcdefg123456.mySchema.myTable
Updated 3 months ago