AWS Glue

Stream data into AWS Glue

Prerequisites

  • An AWS user with sufficient privileges to create and configure IAM users and roles.

AWS Glue Setup

1. Create IAM User

It is recommended to create a separate IAM user and role with minimum necessary access.

  • Go to the Amazon IAM Console.
  • Click Users in the sidebar.
  • Click Add users.
  • Enter a User name: e.g., streamkap_user
  • Set the Access type:
    • ✅ Check “Access key – Programmatic access”
    • ❌ Uncheck “Console access” (not needed)

Click Next.

  • Choose Attach policies directly and create a custom policy:
    • Click Create policy (opens in new tab).
    • Go to the JSON tab and paste this policy, ensuring you replace any <...> placeholders as required:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:CreatePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account_id>:catalog",
                "arn:aws:glue:<region>:<account_id>:database/<glue_database_name>",
                "arn:aws:glue:<region>:<account_id>:table/*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>",
                "arn:aws:s3:::<bucket_name>/*"
            ]
        }
    ]
}

For the Connector to access AWS Glue, it needs to be able to assume the user's role. Please grant the following Trust policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::300973880807:role/kafkaConnectTenantAccessRole"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Streamkap Setup

1. Create the Destination

  • Navigate to Add Connectors.
  • Choose Iceberg.
  • Name: Enter a name for your connector.

2. Connection Settings

  • Catalog Type: The type of Iceberg catalog.
    • GLUE:
      • AWS IAM Role: AWS IAM role (e.g., arn:aws:iam:::role/)
  • Region: The AWS region to be used.
  • S3 Bucket Path: Path to the storage location for the Iceberg tables.
  • Schema: An Iceberg table name prefix—equivalent to a database schema (e.g., public, sales, analytics).

3. Ingestion Settings

  • Ingestion Mode: Specifies the strategy used to insert events into the Iceberg tables.

    ❗️

    Changing ingestion mode

    append and upsert modes use different, incompatible methods for loading data into the Iceberg tables. If - for whatever reason - you want to change modes for an existing Iceberg Connector, please create a new Iceberg Destination instead i.e. a separate destination for insert, and for upsert.

    • upsert mode:
      • Primary key fields: Optional. A comma-separated list of field names to use as record identifiers when a primary key's not present.

Click Save.