Apache Iceberg (Generic)

Prerequisites

  • An AWS user with sufficient privileges to create and configure IAM users, roles and S3 buckets.

Iceberg Data Lake Setup

1. Create S3 Bucket

For better isolation, security and resource management, it is recommended to create a new S3 bucket, specifically for your Streamkap-to-Iceberg pipelines.

  • Go to the Amazon S3 Console.
  • Click Create bucket.
  • Set:
    • Bucket name: e.g., iceberg-bucket.
    • Region: Match your compute region (e.g., us-west-2).
  • Keep Object Ownership set to ACLs disabled.
  • Uncheck “Block all public access” only if you’re managing access via bucket policies (optional).
  • Click Create Bucket.

Optionally, create a base folder to further organize Iceberg tables created by the connector.

  • Open the new bucket.
  • Click Create folder.
  • Enter a name: e.g., warehouse/streamkap/.

2. Create S3 User

It is recommended to create a separate IAM user and role with minimum necessary access.

  • Go to the Amazon IAM Console.
  • Click Users in the sidebar.
  • Click Add users.
  • Enter a User name: e.g., streamkap_user
  • Set the Access type:
    • ✅ Check “Access key – Programmatic access”
    • ❌ Uncheck “Console access” (not needed)

Click Next.

  • Choose Attach policies directly and create a custom policy:
    • Click Create policy (opens in new tab).
    • Go to the JSON tab and paste this policy, ensuring you replace any <...> placeholders as required:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowStreamkapAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account_id>:user/streamkap_user"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<bucket_name>",
        "arn:aws:s3:::<bucket_name>/*"
      ]
    }
  ]
}

Streamkap Setup

1. Create the Destination

  • Navigate to Add Connectors.
  • Choose Iceberg.
  • Name: Enter a name for your connector.

2. Connection Settings

  • Catalog Type: The type of Iceberg catalog.
    • RESTorHIVE:
      • Catalog Name: Iceberg catalog name.
      • Catalog URI: The Iceberg catalog URI.
      • AWS Access Key: The AWS Access Key ID used to connect to S3.
      • AWS Secret Access Key: The AWS Secret Access Key used to connect to S3.
  • Region: The AWS region to be used.
  • S3 Bucket Path: Path to the storage location for the Iceberg tables.
  • Schema: An Iceberg table name prefix—equivalent to a database schema (e.g., public, sales, analytics).

3. Ingestion Settings

  • Ingestion Mode: Specifies the strategy used to insert events into the Iceberg tables.

    ❗️

    Changing ingestion mode

    append and upsert modes use different, incompatible methods for loading data into the Iceberg tables. If - for whatever reason - you want to change modes for an existing Iceberg Connector, please create a new Iceberg Destination instead i.e. a separate destination for insert, and for upsert.

    • upsertmode:
      • Primary key fields: Optional. A comma-separated list of field names to use as record identifiers when a primary key's not present.

Click Save.