Overview

This guide explains how to stream data from a Kafka cluster into the MotherDuck database using native jdbc (duckdb) driver. Since this is direct Kafka sink connector for MotherDuck, it will ingest the data into your database without S3 as an intermediary.

Prerequisites

MotherDuck Account: A valid MotherDuck account and database setup where the data will be loaded.

Motherduck Setup

To set up the Connector, you will need to gather connection details and configure your Motherduck instance. Log in to your Motherduck Account and then follow the steps below.

Get database name

You will need the database name to configure the connector. You can find it on the left side panel of the Motherduck UI, under the Attached databases section.

Generate an access token

For setting the Streamkap DataBricks' Token:

Open Settings page from the top left menu option
Open the Access Tokens page, listed under the INTEGRATIONS section
Click + Create token
Enter a Name and select Read/Write Token option in Token type and keep Automatically expire this token to false
Click Create token
Copy the access token

Streamkap Setup

Go to Destinations and choose Motherduck
Input the following information:
1. Name - A unique and memorable name for this Connector
2. Ingestion Mode (default: upsert) - See Inserts/Upserts for information about ingestion modes
3. Delete Mode (default: none) - Delete records in destination if deleted at source
4. Tasks - If Pipelines for this Destination have lag that's continuously growing over time and not reducing, increase the number of Tasks, otherwise, leave as default
5. Schema Evolution (default: true) - If enabled, the connector will automatically adapt to changes in the source schema by adding new columns in the target table
6. Motherduck Token - The access token you generated earlier
7. Database (Case sensitive) - The name of the database
8. Schema (Case sensitive - default: streamkap) - The schema within the database to write data to
Click Save

How it works

As data's streamed from the source in to topics (think of them as partitioned tables), the Motherduck Sink connector will:

Check whether tables for the topics exist in Motherduck, if not, it creates them
Detect changes between the source data schema and target table schema, and if:
- a new column (by name) is found, add to the end of the table
- an existing column's data type is changed, add a new column to the end of the table named <column_name>_<new_data_type_name>