S3
Change Data Capture using S3 with Streamkap
Requirements
- AWS Access Key and Secret Access Key with the following permissions to the destination bucket:
s3:GetObject
s3:PutObject
s3:AbortMultipartUpload
s3:ListMultipartUploadParts
s3:ListBucketMultipartUploads
Configure S3 Connector
Name
: A descriptive name for the connectorAWS Access Key
: An Access Key with the appropriate permissions for the bucket to which Streamkap will load dataAWS Secret Access Key
: The Secret Access Key with the appropriate permissions for the bucket to which Streamkap will load dataRegion
: Name of the region for bucket to which Streamkap will load dataBucket Name
: The name of the bucket to which Streamkap will load dataFormat
: The format of the file. The following options are available:Avro
,Byte Array
,JSON
andParquet
Filename Template
: The format of the filename. See below for more information about formatting options.Directory
: The name of the directory to which Streamkap will load data
If the Format Type selected is JSON
or Byte Array
:
Compression Type
: How Streamkap should compress the files. The following options are available:None
orgzip
If the Format Type selected is Avro
:
Avro Compression Type
: How Streamkap should compress the files. The following options are available:null
,deflate
,snappy
, orbzip2
If the Format Type selected is Parquet
:
Parquet Compression Type
: How Streamkap should compress the files. The following options are available:None
,gzip
,snappy
,lz4
,brotli
,zstd
, orlzo
Partition Field
: How the files should be partitioned.Automatic
will partition the data automatically and not based on any particular field.Field
will partition the data by the values of a specified field.Time
will partition the data by the time of a particular field
If the Partition Field
is set to Time
:
Time Partition Directory Path Format
: The format of the partitioned directories. For example, if you set format to'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH
, the data directories will have the format/year=2015/month=12/day=07/hour=15/
Time Partition Timezone
: The timezone of the partitioned directories. Accepts short and long standard names like:UTC
,PST
,ECT
,Europe/Berlin
,Europe/Helsinki
, orAmerica/New_York
Filename template (default: {{topic}}--{{partition}}--{{start_offset}}
): The format of the filename. You can combine any of the elements below using other text or characters, including dashes (-
) and underscores (_
)
Element | Description |
---|---|
{{topic}} | The Streamkap topic name. For example, a PostgreSQL Source table web.salesorders topic's name would be salesorders |
{{partition:padding=true|false}} | The partition number of the records in the file, typically 0 . Streamkap topics and their data can be partitioned for better performance in certain scenarios. For example, a topic salesorders has 10 partitions, 0 through to 9. If padding set to true it will set leading zeroes for offset, the default value is false; |
{{start_offset:padding=true|false}} | The offset number of the first record in the file. Every record streamed has an incrementing offset number. For example, a topic salesorders has 1000 records, offsets 0 through to 999. Note that in the case of a multi-partitioned topic, offset numbers are not unique across partitions. If padding set to true it will set leading zeroes for offset, the default value is false; |
{{timestamp:unit=yyyy|MM|dd|HH}} | The timestamp for when the file was created by the Connector. For example, the template {{topic}}{{timestamp:unit=yyyy}}-{{timestamp:unit=MM}} and timestamp of 2024-01-01 20:24 would create a file named salesorders2024-01 |
{{key}} | The Kafka key |
Updated about 1 month ago