Transform Ordering - Streamkap Docs

Streamkap applies two categories of transforms to your data, each with different ordering behavior. Understanding how these transforms are sequenced helps you build predictable pipelines and avoid subtle data issues.

Two Categories of Transforms

Category	Where It Runs	Who Controls Order	User Action
Destination-side transforms	On the destination connector (Kafka Connect SMTs)	System (fixed order)	Configure which transforms are active
Pipeline transforms (Flink)	In the streaming layer (Apache Flink)	You	Define transforms and arrange their sequence

These two categories operate at different stages of the pipeline. Destination-side transforms run after pipeline transforms. Data flows from the source, through any Flink pipeline transforms, and then through destination-side transforms before landing in the destination.

Destination-Side Transforms

Destination-side transforms are applied automatically by the system in a fixed order. You choose which transforms to enable on your destination connector, but the system determines the sequence in which they execute.

Available Destination-Side Transforms

Transform	What It Does
DropFields	Removes specified fields from the record before it reaches the destination.
RenameFields	Renames fields to match destination naming conventions or resolve conflicts.
RenameKeyFields	Renames fields in the message key (as opposed to the value).
CopyField	Copies the value of one field into a new field, preserving the original.
ToJsonJ	Converts structured fields (maps, arrays, nested objects) into JSON strings.
ToJsonbJ	Converts structured fields into PostgreSQL-compatible JSONB strings.
ToIntJ	Casts compatible field values to integer type.
ToFloatJ	Casts compatible field values to float type.
ToStringJ	Casts field values to string type.
AddStringSuffix	Appends a suffix string to the value of a specified field.
StringReplace	Replaces occurrences of a pattern within string field values.
HeaderToFieldCustom	Copies a Kafka message header value into a record field.
MarkColumnsAsOptional	Marks specified columns as optional in the schema, allowing null values.

Fixed Execution Order

The system applies active destination-side transforms in a predetermined sequence. You do not need to set or manage this order — it is handled automatically. Because the order is fixed, certain interactions are predictable:

If you enable both ToJsonJ and DropFields, JSON conversion runs before fields are dropped. If you need a field excluded from JSON output, use DropFields to remove the entire JSON field after conversion.
When both CopyField and RenameFields are enabled, the copy runs first. Rename operations apply to the already-copied fields.
Enabling both ToJsonJ and ToStringJ means JSON conversion runs before string casting. Nested objects are converted to JSON strings first, then any remaining type casting to string is applied.

You do not need to worry about sequencing destination-side transforms yourself. Focus on selecting the right combination of transforms for your use case, and the system handles the rest.

Pipeline Transforms (Flink)

Pipeline transforms run in the Apache Flink streaming layer. Unlike destination-side transforms, you control the order in which pipeline transforms execute. Each transform receives the output of the previous one, forming a chain. For the full list of available pipeline transform types, see Transforms.

Why Order Matters

Pipeline transforms execute sequentially. Each transform receives the output of the one before it. This means:

A filter that removes records will reduce the volume of data that subsequent transforms process.
An enrich transform adds fields that downstream transforms can reference.
A fan-out transform routes records to separate output topics — any subsequent transforms must be configured per output branch, so place fan-out at the end of your chain.

Changing the order of transforms can change both the result and the performance of your pipeline.

Recommended Ordering

The following order works well for most pipelines that combine multiple transform types:

Filter first

Apply Transform / Filter Records early to remove irrelevant records. This reduces the volume of data that all subsequent transforms must process.

Join or Enrich next

Use Join, Enrich, or Enrich (Async) to combine or augment records. Running these after filtering means fewer records to look up or join, which improves performance and reduces API call volume.

Transform / Map after enrichment

Apply Transform / Filter Records again (if needed) to reshape, rename, or compute fields using the enriched data.

Fan Out last

Use Fan Out at the end of your chain so that all preceding logic (filtering, enrichment, mapping) is applied before records are routed to their final output topics.

Not every pipeline needs all of these stages. Use only the transforms your use case requires. The recommended order applies to the transforms you do use.

Practical Examples

Example 1: Filter Before Enrich (Async)

Scenario: You stream order events and want to call an external API to enrich orders with customer details, but only for orders above $100.

Order	Transform	Effect
1	Filter Records	Keep only orders where `amount > 100`. Removes 80% of records.
2	Enrich (Async)	Call external API to fetch customer details for remaining orders.

Why this order matters: Filtering first means you make API calls for only 20% of records instead of 100%. Reversing the order would waste API calls on records you discard.

Example 2: Enrich Then Map

Scenario: You want to enrich product records with category data from a lookup table, then compute a new field based on the enriched data.

Order	Transform	Effect
1	Enrich	Add `category_name` from a cached lookup topic.
2	Transform / Filter Records	Compute `display_label` by combining `product_name` and `category_name`.

Why this order matters: The map transform references the category_name field, which only exists after enrichment. Reversing the order would mean the field is not yet available.

Example 3: Full Chain with Fan Out

Scenario: You stream user activity events and want to filter, enrich, and then route to different output topics based on activity type.

Order	Transform	Effect
1	Filter Records	Remove bot traffic and test accounts.
2	Enrich	Add user profile data from a cached topic.
3	Transform / Filter Records	Normalize field names and compute session duration.
4	Fan Out	Route `purchase` events to one topic and `browse` events to another.

Why this order matters: Fan Out at the end ensures all records are fully processed before being split. If Fan Out ran earlier, you would need to duplicate the enrichment and mapping logic across each output branch.

Common Ordering Mistakes

Avoid these patterns that can cause unexpected behavior or performance issues.

Mistake	Problem	Fix
Enrich (Async) before Filter	Wastes API calls on records that are later discarded	Move Filter before Enrich (Async)
Map/Transform before Enrich	References enriched fields that do not exist yet	Move Enrich before the Map/Transform step
Fan Out before Map	Requires duplicating transform logic in each output branch	Move Fan Out to the end of the chain
Filter after Join	Joins all records first, then discards some — wastes join processing	Filter input topics before joining

End-to-End Data Flow

To understand where each category of transforms fits, here is the full data path:

Source — Change events are captured from your database.
Pipeline transforms (Flink) — Your user-defined transforms (Filter, Enrich, Join, Fan Out) execute in the order you specify.
Destination-side transforms (SMTs) — System-managed transforms (DropFields, RenameFields, ToJsonJ, etc.) apply in a fixed order.
Destination — Transformed data lands in your data warehouse or lake.

If you need a field dropped before it reaches the destination, you can use either a Flink pipeline transform (to remove it from the stream early) or a destination-side DropFields transform (to remove it at the last step). Choose based on whether downstream Flink transforms still need the field.

​Two Categories of Transforms

​Destination-Side Transforms

​Available Destination-Side Transforms

​Fixed Execution Order

​Pipeline Transforms (Flink)

​Why Order Matters

​Recommended Ordering

​Practical Examples

​Example 1: Filter Before Enrich (Async)

​Example 2: Enrich Then Map

​Example 3: Full Chain with Fan Out

​Common Ordering Mistakes

​End-to-End Data Flow

​See Also

Two Categories of Transforms

Destination-Side Transforms

Available Destination-Side Transforms

Fixed Execution Order

Pipeline Transforms (Flink)

Why Order Matters

Recommended Ordering

Practical Examples

Example 1: Filter Before Enrich (Async)

Example 2: Enrich Then Map

Example 3: Full Chain with Fan Out

Common Ordering Mistakes

End-to-End Data Flow

See Also