ClickHouse Cloud
Prerequisites
- ClickHouse User
- ClickHouse database hostname
Delivery Guarantees
Streamkap offers at-least-once delivery guarantee for ClickHouse and defaults to upsert
mode.
For append
(or insert) mode, duplicate records will be inserted into destination tables, but, ClickHouse materialized views can filter them out.
With upsert
mode (default) we carry out deduplication using the records primary key.
See Inserts/Upserts for more information about ingestion modes.
Data duplication
Only applies to
upsert
configured ClickHouse Destinations and destination tables.
Although using the ClickHouse FINAL
query modifier has drawbacks , we have seen situations where users SELECT
queries on the destination tables return some duplicate records and the FINAL
query modifier helps avoid that.
Behind-the-scenes, upsert
ingestion mode relies on ClickHouse's ReplacingMergeTree
table engine to deduplicate data in the background instead of when the data's inserted or merged (e.g. MERGE INTO
for other data warehouses like Snowflake or DataBricks) into the destination tables.
To save having to specify the FINAL
query modifier in every query, you can consider altering the querying user's role as shown below:
-- Replace { ... } placeholders as required
ALTER ROLE {ROLENAME} SETTINGS final = 1;
Data Type Support
All mappings are done on a 'best effort, nearest equivalent' basis because many data warehouses and data processing systems have unique data types or share data types but interpret them differently.
Our Connector supports the following data types and mappings:
Streamkap | ClickHouse |
---|---|
INT8 | Int8 |
INT16 | Int16 |
INT32 | Int32 |
FLOAT32 | Float32 |
FLOAT64 | Float64 |
BOOLEAN | Bool |
BYTES | BLOB (String) |
org.apache.kafka.connect.data.Decimal | DECIMAL(38, 0) |
org.apache.kafka.connect.data.Timestamp io.debezium.time.ZonedTimestamp | DateTime64 |
org.apache.kafka.connect.data.Date | Date |
io.debezium.date.Json | String (JSON) |
STRUCT (e.g. maps, key/value pairs) | Tuple |
ARRAY | Array |
ClickHouse Setup
It's recommended to create a separate user and role for Streamkap to access your ClickHouse database. Below is an example script that does that.
-- Replace { ... } placeholders as required
CREATE ROLE STREAMKAP_ROLE;
GRANT SELECT, CREATE, SHOW, INSERT, ALTER ON {database}.* TO STREAMKAP_ROLE;
CREATE USER streamkap_user IDENTIFIED WITH sha256_password BY '{password}'
GRANT STREAMKAP_ROLE TO streamkap_user;
Allowing Streamkap access
IP access lists in ClickHouse determine what services can connect to your ClickHouse database.
From your ClickHouse Cloud services list:
- Choose the service and then Settings
- Click + Add entry and enter Streamkap IP Addresses
- Enter a description e.g. Streamkap app
- Click Add entry
See Setting IP Filters for more information about ClickHouse IP access lists.
Obtaining connection details
You will need the hostname of your ClickHouse database when setting up the Destination in Streamkap.
From your ClickHouse Cloud services list:
- Choose the service and then Connect
- At the bottom of the connection window you will see
https://
followed by the hostname like thisabc123.us-west-2.aws.clickhouse.cloud
. The part afterhttps://
(excluding the port:port
) is the hostname
Handling semi-structured data
To support complex data structures such as nested arrays of structs and nested structs containing arrays, the Connector's user role needs to be configured as shown below:
ALTER ROLE STREAMKAP_ROLE SETTINGS allow_experimental_object_type = 1;
Streamkap Setup
- Go to Destinations and choose ClickHouse
- Input the following information:
- Name - A unique and memorable name for this Connector
- Hostname - The IP or URL of your ClickHouse instance
- Ingestion Mode (default:
upsert
) - See Inserts/Upserts for information about ingestion modes - Port (default:
8443
) - Username (case sensitive) -
STREAMKAP_USER
or the username you chose - Password - Password for the database user
- Database - The name of the database
- SSL - SSL is enabled by default on ClickHouse instances
- Tasks - If Pipelines for this Destination have lag that's continuously growing over time and not reducing, increase the number of Tasks, otherwise, leave as default
- Click Save
Updated about 1 month ago
Check out our blog article Streaming with Change Data Capture to ClickHouse for an in-depth look at how our Connector works