ClickHouse

Prerequisites

  • ClickHouse User
  • ClickHouse database hostname

Delivery Guarantees

Streamkap offers at-least-once delivery guarantee for ClickHouse and defaults to upsert mode.

For append (or insert) mode, duplicate records will be inserted into destination tables, but, ClickHouse materialized views can filter them out.

With upsert mode (default) we carry out deduplication using the records primary key.

πŸ“˜

See Inserts/Upserts for more information about ingestion modes.

Data duplication

πŸ“˜

Only applies to upsert configured ClickHouse Destinations and destination tables.

Although using the ClickHouse FINAL query modifier has drawbacks , we have seen situations where users SELECT queries on the destination tables return some duplicate records and the FINAL query modifier helps avoid that.

Behind-the-scenes, upsert ingestion mode relies on ClickHouse's ReplacingMergeTree table engine to deduplicate data in the background instead of when the data's inserted or merged (e.g. MERGE INTO for other data warehouses like Snowflake or DataBricks) into the destination tables.

To save having to specify the FINAL query modifier in every query, you can consider altering the querying user's role as shown below:

-- Replace { ... } placeholders as required
ALTER ROLE {ROLENAME} SETTINGS final = 1;

Data Type Support

πŸ“˜

All mappings are done on a 'best effort, nearest equivalent' basis because many data warehouses and data processing systems have unique data types or share data types but interpret them differently.

Our Connector supports the following data types and mappings:

StreamkapClickHouse
INT8Int8
INT16Int16
INT32Int32
FLOAT32Float32
FLOAT64Float64
BOOLEANBool
BYTESBLOB(String)
org.apache.kafka.connect.data.DecimalDECIMAL(38, 0)
org.apache.kafka.connect.data.Timestamp
io.debezium.time.ZonedTimestamp
DateTime64
org.apache.kafka.connect.data.DateDate
io.debezium.date.JsonString(JSON)
STRUCT(e.g. maps, key/value pairs)Tuple
ARRAYArray

ClickHouse Setup

It's recommended to create a separate user and role for Streamkap to access your ClickHouse database. Below is an example script that does that.

-- Replace { ... } placeholders as required
CREATE ROLE STREAMKAP_ROLE;
GRANT SELECT, CREATE, SHOW, INSERT, ALTER ON {database}.* TO STREAMKAP_ROLE;

CREATE USER streamkap_user IDENTIFIED WITH sha256_password BY '{password}'
GRANT STREAMKAP_ROLE TO streamkap_user;

Allowing Streamkap access

IP access lists in ClickHouse determine what services can connect to your ClickHouse database.

From your ClickHouse Cloud services list:

  1. Choose the service and then Settings
  2. Click + Add entry and enter Streamkap IP Addresses
  3. Enter a description e.g. Streamkap app
  4. Click Add entry

πŸ“˜

See Setting IP Filters for more information about ClickHouse IP access lists.

Obtaining connection details

You will need the hostname of your ClickHouse database when setting up the Destination in Streamkap.

From your ClickHouse Cloud services list:

  1. Choose the service and then Connect
  2. At the bottom of the connection window you will see https:// followed by the hostname like this abc123.us-west-2.aws.clickhouse.cloud. The part after https:// (excluding the port :port) is the hostname

Handling semi-structured data

To support complex data structures such as nested arrays of structs and nested structs containing arrays, the Connector's user role needs to be configured as shown below:

ALTER ROLE STREAMKAP_ROLE SETTINGS flatten_nested = 0;

Streamkap Setup

  1. Go to Destinations and choose ClickHouse
  2. Input the following information:
    1. Name - A unique and memorable name for this Connector
    2. Hostname - The IP or URL of your ClickHouse instance
    3. Ingestion Mode (default: upsert) - See Inserts/Upserts for information about ingestion modes
    4. Port (default: 8443)
    5. Username (case sensitive) - STREAMKAP_USER or the username you chose
    6. Password - Password for the database user
    7. Database - The name of the database
    8. SSL - SSL is enabled by default on ClickHouse instances
    9. Tasks - If Pipelines for this Destination have lag that's continuously growing over time and not reducing, increase the number of Tasks, otherwise, leave as default
  3. Click Save

What’s Next

Check out our blog article Streaming with Change Data Capture to ClickHouse for an in-depth look at how our Connector works