Databricks Delta Lake

Streamkap Setup

To set up the Connector, you will need to gather connection details and configure your DataBricks cluster. Log in to your Databricks Cloud Account and then follow the steps below.

Get connection details

For setting the Streamkap DataBricks' Cluster JDBC URL:

Open Compute page from the sidebar and choose your cluster
Click on Advanced Options
Open the JDBC/ODBC tab
Copy the JDBC Connection URL
1. NOTE: ConnCatalog=<your catalog name> jdbc url parameter can be used to select a catalog other than the default

Generate an access token

For setting the Streamkap DataBricks' Token:

Open Settings page from the sidebar and then User Settings
Open the Personal Access Tokens tab
Click + Generate New Token
(Optional) Enter a comment and change the token lifetime
Click Generate
Copy the access token

Create a temporary directory

Create tmp directory on the Databricks File System (DBFS)

How it works

As data's streamed from the source in to topics (think of them as partitioned tables), the Databricks Sink connector will:

Check whether tables for the topics exist in Databricks, if not, it creates them
Detect changes between the source data schema and target table schema, and if:
- a new column (by name) is found, add to the end of the table
- an existing column's data type is changed, add a new column to the end of the table named <column_name>_<new_data_type_name>
Stream change data into Parquet files and upload them to the tmp directory on the Databricks File System (DBFS) and:
- Load data to the target table using SQL bulk import COPY
- Clean up the Parquet files

Updated 7 months ago