> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Delta Lake

# Streamkap Setup

To set up the Connector, you will need to gather connection details and configure your DataBricks cluster. Log in to your [Databricks Cloud Account](https://accounts.cloud.databricks.com) and then follow the steps below.

## Get connection details

Streamkap connects to Databricks via a JDBC URL. You can use either an [**All-Purpose Compute**](https://docs.databricks.com/aws/en/compute#classic-compute) or a [**SQL Warehouse**](https://docs.databricks.com/en/compute/sql-warehouse/index.html) as the compute resource.

### Option A: All-Purpose Compute

1. Open the **Compute** page from the sidebar and choose your cluster
2. Click on **Advanced Options**
3. Open the **JDBC/ODBC** tab
4. Copy the JDBC Connection URL

### Option B: SQL Warehouse

A [SQL Warehouse](https://docs.databricks.com/en/compute/sql-warehouse/index.html) can automatically scale across multiple Spark clusters to handle concurrent workloads, but is generally more expensive than an All-Purpose Cluster.

To get the JDBC URL for a SQL Warehouse:

1. Open the **SQL Warehouses** page from the sidebar
2. Select your warehouse
3. Open the **Connection Details** tab
4. Copy the **JDBC URL**

<Note>
  For both options, you can append `ConnCatalog=<your catalog name>` to the JDBC URL to select a catalog other than the default.
</Note>

## Generate an access token

For setting the Streamkap DataBricks' Token:

1. Open **Settings** page from the sidebar and then **User Settings**
2. Open the **Personal Access Tokens** tab
3. Click **+ Generate New Token**
4. (Optional) Enter a comment and change the token lifetime
5. Click **Generate**
6. Copy the access token

## Create a temporary directory

1. Create `tmp` directory on the Databricks File System (DBFS)

# How it works

As data's streamed from the source in to topics (think of them as partitioned tables), the Databricks Sink connector will:

* Check whether tables for the topics exist in Databricks, if not, it creates them

* Automatically handle [schema evolution](/schema-evolution-support) when the source schema changes (e.g. new columns, data type changes)

* Stream change data into Parquet files and upload them to the `tmp` directory on the Databricks File System (DBFS) and:

  * Load data to the target table using SQL bulk import `COPY`
  * Clean up the Parquet files

## Ingestion Modes

Streamkap supports two ingestion modes for writing data to Databricks Delta Lake: **Upsert** and **Append**.

### Upsert

Upsert mode uses a `MERGE INTO` statement to insert new records and update existing ones based on the primary key columns from the source table.

* **New records** (no matching primary key in the target) are inserted
* **Existing records** (matching primary key) are updated with the latest values
* **Deleted records** (when hard delete is enabled) are physically removed from the target table
* **Out-of-order protection:** Streamkap tracks record timestamps and offsets to ensure older records never overwrite newer data

Upsert is the recommended mode for most use cases, as it keeps your target table in sync with the source and handles updates and deletes automatically.

### Append

Append mode uses a simple `INSERT INTO` statement to add all incoming records as new rows.

* Every record is inserted regardless of whether a row with the same key already exists
* No deduplication or update logic is applied
* Deletes from the source are not reflected in the target

Append is useful for event logs, audit trails, or any scenario where you want to preserve every change as a separate row rather than maintaining a current-state replica.
