S3

Change Data Capture using S3 with Streamkap

Requirements

  • AWS Access Key and Secret Access Key with the following permissions to the destination bucket:
    • s3:GetObject
    • s3:PutObject
    • s3:AbortMultipartUpload
    • s3:ListMultipartUploadParts
    • s3:ListBucketMultipartUploads

Configure S3 Connector

  • Name: A descriptive name for the connector
  • AWS Access Key: An Access Key with the appropriate permissions for the bucket to which Streamkap will load data
  • AWS Secret Access Key: The Secret Access Key with the appropriate permissions for the bucket to which Streamkap will load data
  • Region: Name of the region for bucket to which Streamkap will load data
  • Bucket Name: The name of the bucket to which Streamkap will load data
  • Format: The format of the file. The following options are available: Avro, Byte Array, JSON and Parquet
  • Filename Template: The format of the filename. See below for more information about formatting options.
  • Directory: The name of the directory to which Streamkap will load data

If the Format Type selected is JSON or Byte Array:

  • Compression Type: How Streamkap should compress the files. The following options are available: None or gzip

If the Format Type selected is Avro:

  • Avro Compression Type: How Streamkap should compress the files. The following options are available: null, deflate, snappy, or bzip2

If the Format Type selected is Parquet:

  • Parquet Compression Type: How Streamkap should compress the files. The following options are available: None, gzip, snappy, lz4, brotli, zstd, or lzo
  • Partition Field: How the files should be partitioned. Automatic will partition the data automatically and not based on any particular field. Field will partition the data by the values of a specified field. Time will partition the data by the time of a particular field

If the Partition Field is set to Time:

  • Time Partition Directory Path Format: The format of the partitioned directories. For example, if you set format to'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH, the data directories will have the format /year=2015/month=12/day=07/hour=15/
  • Time Partition Timezone: The timezone of the partitioned directories. Accepts short and long standard names like: UTC, PST, ECT, Europe/Berlin, Europe/Helsinki, or America/New_York

Filename template (default: {{topic}}--{{partition}}--{{start_offset}}): The format of the filename. You can combine any of the elements below using other text or characters, including dashes (-) and underscores (_)

ElementDescription
{{topic}}The Streamkap topic name. For example, a PostgreSQL Source table web.salesorders topic's name would be salesorders
{{partition:padding=true|false}}The partition number of the records in the file, typically 0. Streamkap topics and their data can be partitioned for better performance in certain scenarios. For example, a topic salesorders has 10 partitions, 0 through to 9. If padding set to true it will set leading zeroes for offset, the default value is false;
{{start_offset:padding=true|false}}The offset number of the first record in the file. Every record streamed has an incrementing offset number. For example, a topic salesorders has 1000 records, offsets 0 through to 999. Note that in the case of a multi-partitioned topic, offset numbers are not unique across partitions. If padding set to true it will set leading zeroes for offset, the default value is false;
{{timestamp:unit=yyyy|MM|dd|HH}}The timestamp for when the file was created by the Connector. For example, the template {{topic}}{{timestamp:unit=yyyy}}-{{timestamp:unit=MM}} and timestamp of 2024-01-01 20:24 would create a file named salesorders2024-01
{{key}}The Kafka key