Skip to main content
Streamkap applies two categories of transforms to your data, each with different ordering behavior. Understanding how these transforms are sequenced helps you build predictable pipelines and avoid subtle data issues.

Two Categories of Transforms

CategoryWhere It RunsWho Controls OrderUser Action
Destination-side transformsOn the destination connector (Kafka Connect SMTs)System (fixed order)Configure which transforms are active
Pipeline transforms (Flink)In the streaming layer (Apache Flink)YouDefine transforms and arrange their sequence
These two categories operate at different stages of the pipeline. Destination-side transforms run after pipeline transforms. Data flows from the source, through any Flink pipeline transforms, and then through destination-side transforms before landing in the destination.

Destination-Side Transforms

Destination-side transforms are applied automatically by the system in a fixed order. You choose which transforms to enable on your destination connector, but the system determines the sequence in which they execute.

Available Destination-Side Transforms

TransformWhat It Does
DropFieldsRemoves specified fields from the record before it reaches the destination.
RenameFieldsRenames fields to match destination naming conventions or resolve conflicts.
RenameKeyFieldsRenames fields in the message key (as opposed to the value).
CopyFieldCopies the value of one field into a new field, preserving the original.
ToJsonJConverts structured fields (maps, arrays, nested objects) into JSON strings.
ToJsonbJConverts structured fields into PostgreSQL-compatible JSONB strings.
ToIntJCasts compatible field values to integer type.
ToFloatJCasts compatible field values to float type.
ToStringJCasts field values to string type.
AddStringSuffixAppends a suffix string to the value of a specified field.
StringReplaceReplaces occurrences of a pattern within string field values.
HeaderToFieldCustomCopies a Kafka message header value into a record field.
MarkColumnsAsOptionalMarks specified columns as optional in the schema, allowing null values.

Fixed Execution Order

The system applies active destination-side transforms in a predetermined sequence. You do not need to set or manage this order — it is handled automatically. Because the order is fixed, certain interactions are predictable:
  • If you enable both ToJsonJ and DropFields, JSON conversion runs before fields are dropped. If you need a field excluded from JSON output, use DropFields to remove the entire JSON field after conversion.
  • When both CopyField and RenameFields are enabled, the copy runs first. Rename operations apply to the already-copied fields.
  • Enabling both ToJsonJ and ToStringJ means JSON conversion runs before string casting. Nested objects are converted to JSON strings first, then any remaining type casting to string is applied.
You do not need to worry about sequencing destination-side transforms yourself. Focus on selecting the right combination of transforms for your use case, and the system handles the rest.

Pipeline transforms run in the Apache Flink streaming layer. Unlike destination-side transforms, you control the order in which pipeline transforms execute. Each transform receives the output of the previous one, forming a chain. For the full list of available pipeline transform types, see Transforms.

Why Order Matters

Pipeline transforms execute sequentially. Each transform receives the output of the one before it. This means:
  1. A filter that removes records will reduce the volume of data that subsequent transforms process.
  2. An enrich transform adds fields that downstream transforms can reference.
  3. A fan-out transform routes records to separate output topics — any subsequent transforms must be configured per output branch, so place fan-out at the end of your chain.
Changing the order of transforms can change both the result and the performance of your pipeline. The following order works well for most pipelines that combine multiple transform types:
1

Filter first

Apply Transform / Filter Records early to remove irrelevant records. This reduces the volume of data that all subsequent transforms must process.
2

Join or Enrich next

Use Join, Enrich, or Enrich (Async) to combine or augment records. Running these after filtering means fewer records to look up or join, which improves performance and reduces API call volume.
3

Transform / Map after enrichment

Apply Transform / Filter Records again (if needed) to reshape, rename, or compute fields using the enriched data.
4

Fan Out last

Use Fan Out at the end of your chain so that all preceding logic (filtering, enrichment, mapping) is applied before records are routed to their final output topics.
Not every pipeline needs all of these stages. Use only the transforms your use case requires. The recommended order applies to the transforms you do use.

Practical Examples

Example 1: Filter Before Enrich (Async)

Scenario: You stream order events and want to call an external API to enrich orders with customer details, but only for orders above $100.
OrderTransformEffect
1Filter RecordsKeep only orders where amount > 100. Removes 80% of records.
2Enrich (Async)Call external API to fetch customer details for remaining orders.
Why this order matters: Filtering first means you make API calls for only 20% of records instead of 100%. Reversing the order would waste API calls on records you discard.

Example 2: Enrich Then Map

Scenario: You want to enrich product records with category data from a lookup table, then compute a new field based on the enriched data.
OrderTransformEffect
1EnrichAdd category_name from a cached lookup topic.
2Transform / Filter RecordsCompute display_label by combining product_name and category_name.
Why this order matters: The map transform references the category_name field, which only exists after enrichment. Reversing the order would mean the field is not yet available.

Example 3: Full Chain with Fan Out

Scenario: You stream user activity events and want to filter, enrich, and then route to different output topics based on activity type.
OrderTransformEffect
1Filter RecordsRemove bot traffic and test accounts.
2EnrichAdd user profile data from a cached topic.
3Transform / Filter RecordsNormalize field names and compute session duration.
4Fan OutRoute purchase events to one topic and browse events to another.
Why this order matters: Fan Out at the end ensures all records are fully processed before being split. If Fan Out ran earlier, you would need to duplicate the enrichment and mapping logic across each output branch.

Common Ordering Mistakes

Avoid these patterns that can cause unexpected behavior or performance issues.
MistakeProblemFix
Enrich (Async) before FilterWastes API calls on records that are later discardedMove Filter before Enrich (Async)
Map/Transform before EnrichReferences enriched fields that do not exist yetMove Enrich before the Map/Transform step
Fan Out before MapRequires duplicating transform logic in each output branchMove Fan Out to the end of the chain
Filter after JoinJoins all records first, then discards some — wastes join processingFilter input topics before joining

End-to-End Data Flow

To understand where each category of transforms fits, here is the full data path:
  1. Source — Change events are captured from your database.
  2. Pipeline transforms (Flink) — Your user-defined transforms (Filter, Enrich, Join, Fan Out) execute in the order you specify.
  3. Destination-side transforms (SMTs) — System-managed transforms (DropFields, RenameFields, ToJsonJ, etc.) apply in a fixed order.
  4. Destination — Transformed data lands in your data warehouse or lake.
If you need a field dropped before it reaches the destination, you can use either a Flink pipeline transform (to remove it from the stream early) or a destination-side DropFields transform (to remove it at the last step). Choose based on whether downstream Flink transforms still need the field.

See Also