> ## Documentation Index
> Fetch the complete documentation index at: https://docs.streamkap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Transform Ordering

> Understand how transforms are ordered and how execution sequence affects your data pipeline.

Streamkap applies two categories of transforms to your data, each with different ordering behavior. Understanding how these transforms are sequenced helps you build predictable pipelines and avoid subtle data issues.

## Two Categories of Transforms

| Category                        | Where It Runs                                     | Who Controls Order   | User Action                                  |
| ------------------------------- | ------------------------------------------------- | -------------------- | -------------------------------------------- |
| **Destination-side transforms** | On the destination connector (Kafka Connect SMTs) | System (fixed order) | Configure which transforms are active        |
| **Pipeline transforms (Flink)** | In the streaming layer (Apache Flink)             | You                  | Define transforms and arrange their sequence |

<Note>
  These two categories operate at different stages of the pipeline. Destination-side transforms run **after** pipeline transforms. Data flows from the source, through any Flink pipeline transforms, and then through destination-side transforms before landing in the destination.
</Note>

***

## Destination-Side Transforms

Destination-side transforms are applied automatically by the system in a fixed order. You choose which transforms to enable on your destination connector, but the system determines the sequence in which they execute.

### Available Destination-Side Transforms

| Transform                 | What It Does                                                                 |
| ------------------------- | ---------------------------------------------------------------------------- |
| **DropFields**            | Removes specified fields from the record before it reaches the destination.  |
| **RenameFields**          | Renames fields to match destination naming conventions or resolve conflicts. |
| **RenameKeyFields**       | Renames fields in the message key (as opposed to the value).                 |
| **CopyField**             | Copies the value of one field into a new field, preserving the original.     |
| **ToJsonJ**               | Converts structured fields (maps, arrays, nested objects) into JSON strings. |
| **ToJsonbJ**              | Converts structured fields into PostgreSQL-compatible JSONB strings.         |
| **ToIntJ**                | Casts compatible field values to integer type.                               |
| **ToFloatJ**              | Casts compatible field values to float type.                                 |
| **ToStringJ**             | Casts field values to string type.                                           |
| **AddStringSuffix**       | Appends a suffix string to the value of a specified field.                   |
| **StringReplace**         | Replaces occurrences of a pattern within string field values.                |
| **HeaderToFieldCustom**   | Copies a Kafka message header value into a record field.                     |
| **MarkColumnsAsOptional** | Marks specified columns as optional in the schema, allowing null values.     |

### Fixed Execution Order

The system applies active destination-side transforms in a predetermined sequence. You do not need to set or manage this order -- it is handled automatically.

Because the order is fixed, certain interactions are predictable:

* If you enable both **ToJsonJ** and **DropFields**, JSON conversion runs **before** fields are dropped. If you need a field excluded from JSON output, use DropFields to remove the entire JSON field after conversion.
* When both **CopyField** and **RenameFields** are enabled, the copy runs first. Rename operations apply to the already-copied fields.
* Enabling both **ToJsonJ** and **ToStringJ** means JSON conversion runs before string casting. Nested objects are converted to JSON strings first, then any remaining type casting to string is applied.

<Tip>
  You do not need to worry about sequencing destination-side transforms yourself. Focus on selecting the right combination of transforms for your use case, and the system handles the rest.
</Tip>

***

## Pipeline Transforms (Flink)

Pipeline transforms run in the Apache Flink streaming layer. Unlike destination-side transforms, **you control the order** in which pipeline transforms execute. Each transform receives the output of the previous one, forming a chain.

For the full list of available pipeline transform types, see [Transforms](/transforms).

### Why Order Matters

Pipeline transforms execute sequentially. Each transform receives the output of the one before it. This means:

1. A **filter** that removes records will reduce the volume of data that subsequent transforms process.
2. An **enrich** transform adds fields that downstream transforms can reference.
3. A **fan-out** transform routes records to separate output topics — any subsequent transforms must be configured per output branch, so place fan-out at the end of your chain.

Changing the order of transforms can change both the **result** and the **performance** of your pipeline.

### Recommended Ordering

The following order works well for most pipelines that combine multiple transform types:

<Steps>
  <Step title="Filter first">
    Apply [Transform / Filter Records](/transform-filter-records) early to remove irrelevant records. This reduces the volume of data that all subsequent transforms must process.
  </Step>

  <Step title="Join or Enrich next">
    Use [Join](/transform-join), [Enrich](/transform-enrich), or [Enrich (Async)](/transform-enrich-async) to combine or augment records. Running these after filtering means fewer records to look up or join, which improves performance and reduces API call volume.
  </Step>

  <Step title="Transform / Map after enrichment">
    Apply [Transform / Filter Records](/transform-filter-records) again (if needed) to reshape, rename, or compute fields using the enriched data.
  </Step>

  <Step title="Fan Out last">
    Use [Fan Out](/transform-fanout) at the end of your chain so that all preceding logic (filtering, enrichment, mapping) is applied before records are routed to their final output topics.
  </Step>
</Steps>

<Info>
  Not every pipeline needs all of these stages. Use only the transforms your use case requires. The recommended order applies to the transforms you do use.
</Info>

### Practical Examples

#### Example 1: Filter Before Enrich (Async)

**Scenario:** You stream order events and want to call an external API to enrich orders with customer details, but only for orders above \$100.

| Order | Transform      | Effect                                                            |
| ----- | -------------- | ----------------------------------------------------------------- |
| 1     | Filter Records | Keep only orders where `amount > 100`. Removes 80% of records.    |
| 2     | Enrich (Async) | Call external API to fetch customer details for remaining orders. |

**Why this order matters:** Filtering first means you make API calls for only 20% of records instead of 100%. Reversing the order would waste API calls on records you discard.

#### Example 2: Enrich Then Map

**Scenario:** You want to enrich product records with category data from a lookup table, then compute a new field based on the enriched data.

| Order | Transform                  | Effect                                                                   |
| ----- | -------------------------- | ------------------------------------------------------------------------ |
| 1     | Enrich                     | Add `category_name` from a cached lookup topic.                          |
| 2     | Transform / Filter Records | Compute `display_label` by combining `product_name` and `category_name`. |

**Why this order matters:** The map transform references the `category_name` field, which only exists after enrichment. Reversing the order would mean the field is not yet available.

#### Example 3: Full Chain with Fan Out

**Scenario:** You stream user activity events and want to filter, enrich, and then route to different output topics based on activity type.

| Order | Transform                  | Effect                                                               |
| ----- | -------------------------- | -------------------------------------------------------------------- |
| 1     | Filter Records             | Remove bot traffic and test accounts.                                |
| 2     | Enrich                     | Add user profile data from a cached topic.                           |
| 3     | Transform / Filter Records | Normalize field names and compute session duration.                  |
| 4     | Fan Out                    | Route `purchase` events to one topic and `browse` events to another. |

**Why this order matters:** Fan Out at the end ensures all records are fully processed before being split. If Fan Out ran earlier, you would need to duplicate the enrichment and mapping logic across each output branch.

### Common Ordering Mistakes

<Warning>
  Avoid these patterns that can cause unexpected behavior or performance issues.
</Warning>

| Mistake                      | Problem                                                               | Fix                                       |
| ---------------------------- | --------------------------------------------------------------------- | ----------------------------------------- |
| Enrich (Async) before Filter | Wastes API calls on records that are later discarded                  | Move Filter before Enrich (Async)         |
| Map/Transform before Enrich  | References enriched fields that do not exist yet                      | Move Enrich before the Map/Transform step |
| Fan Out before Map           | Requires duplicating transform logic in each output branch            | Move Fan Out to the end of the chain      |
| Filter after Join            | Joins all records first, then discards some -- wastes join processing | Filter input topics before joining        |

***

## End-to-End Data Flow

To understand where each category of transforms fits, here is the full data path:

1. **Source** -- Change events are captured from your database.
2. **Pipeline transforms (Flink)** -- Your user-defined transforms (Filter, Enrich, Join, Fan Out) execute in the order you specify.
3. **Destination-side transforms (SMTs)** -- System-managed transforms (DropFields, RenameFields, ToJsonJ, etc.) apply in a fixed order.
4. **Destination** -- Transformed data lands in your data warehouse or lake.

<Tip>
  If you need a field dropped before it reaches the destination, you can use either a Flink pipeline transform (to remove it from the stream early) or a destination-side DropFields transform (to remove it at the last step). Choose based on whether downstream Flink transforms still need the field.
</Tip>

## See Also

* [Transform Types Overview](/transforms) - All available transform types
* [Streaming Transforms](/transforms-1) - Managing transforms in the UI
