diff --git a/manage-data/images/icon-check.svg b/manage-data/images/icon-check.svg
new file mode 100644
index 000000000..69192b92c
--- /dev/null
+++ b/manage-data/images/icon-check.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/manage-data/images/icon-cross.svg b/manage-data/images/icon-cross.svg
new file mode 100644
index 000000000..48ca292d5
--- /dev/null
+++ b/manage-data/images/icon-cross.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/manage-data/ingest/transform-enrich/common-mistakes.md b/manage-data/ingest/transform-enrich/common-mistakes.md
new file mode 100644
index 000000000..dc70a1ac9
--- /dev/null
+++ b/manage-data/ingest/transform-enrich/common-mistakes.md
@@ -0,0 +1,509 @@
+---
+mapped_pages:
+ - https://www.elastic.co/docs/manage-data/ingest/transform-enrich/common-mistakes.html
+applies_to:
+ stack: ga
+ serverless: ga
+---
+
+# Create readable and maintainable ingest pipelines
+
+There are many ways to achieve similar results when creating ingest pipelines, which can make maintenance and readability difficult. This guide outlines patterns you can follow to make the maintenance and readability of ingest pipelines easier without sacrificing functionality.
+
+:::{note}
+This guide does not provide guidance on optimizing for ingest pipeline performance.
+:::
+
+## If Statements
+
+`if` statements are frequently used in ingest pipeline processors to ensure that a processor only runs when specific conditions are met. By adding an `if` condition, you can control when a processor is applied, making your pipelines more flexible and robust.
+
+### Avoiding Excessive OR Conditions
+
+When using the [boolean OR operator](elasticsearch://reference/scripting-languages/painless/painless-operators-boolean.md#boolean-or-operator) (`||`), it's easy for `if` conditions to become overly complex and difficult to maintain, especially when chaining many OR checks. Instead, consider using array-based checks like `.contains()` to simplify your logic and improve readability.
+
+####  **Don't**: Run many ORs
+
+```painless
+"if": "ctx?.kubernetes?.container?.name == 'admin' || ctx?.kubernetes?.container?.name == 'def'
+|| ctx?.kubernetes?.container?.name == 'demo' || ctx?.kubernetes?.container?.name == 'acme'
+|| ctx?.kubernetes?.container?.name == 'wonderful'
+```
+
+####  **Do**: Use contains to compare
+
+```painless
+["admin","def", ...].contains(ctx.kubernetes?.container?.name)
+```
+
+A key implication is that using `["admin", "def", ...].contains(ctx.kubernetes?.container?.name)` checks for exact matches only. For example, `"admin".contains("admin")` returns `true`, but `"demo-admin-demo".contains("admin")` would not match unless you explicitly check for partial matches using something like `ctx.kubernetes.container.name.contains('admin') || ...`.
+
+Also, the null safe operator (`?.`) works as expected here—if the value is `null`, the `contains` method simply returns `false` without causing an error.
+
+### Null safe operator
+
+Anticipate potential problems with the data, and use the [null safe operator](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) (`?.`) to prevent data from being processed incorrectly.
+
+:::{tip}
+It is not necessary to use a null safe operator for first level objects
+(for example, use `ctx.openshift` instead of `ctx?.openshift`).
+`ctx` will only ever be `null` if the entire `_source` is empty.
+:::
+
+For example, if you only want data that has a valid string in a `ctx.openshift.origin.threadId` field:
+
+####  **Don't**: Leave the condition vulnerable to failures and use redundant checks
+
+```painless
+ctx.openshift.origin != null <1>
+&& ctx.openshift.origin.threadId != null <2>
+```
+
+1. It's unnecessary to check both `openshift.origin` and `openshift.origin.threadId`.
+2. This will fail if `openshift` is not properly set because it assumes that `ctx.openshift` and `ctx.openshift.origin` both exist.
+
+####  **Do**: Use the null safe operator
+
+```painless
+ctx.openshift?.origin?.threadId instanceof String <1>
+```
+
+1. Only if there's a `ctx.openshift` and a `ctx.openshift.origin` will it check for a `ctx.openshift.origin.threadId` and make sure it is a string.
+
+### Use null safe operators when checking type
+
+If you're using a null safe operator, it will return the value if it is not `null` so there is no reason to check whether a value is not `null` before checking the type of that value.
+
+For example, if you only want data when the value of the `ctx.openshift.origin.eventPayload` field is a string:
+
+####  **Don't**: Use redundant checks
+
+```painless
+ctx?.openshift?.eventPayload != null && ctx.openshift.eventPayload instanceof String
+```
+
+####  **Do**: Use the null safe operator with the type check
+
+```painless
+ctx.openshift?.eventPayload instanceof String
+```
+
+### Use null safe operator with boolean OR operator
+
+When using the [boolean OR operator](elasticsearch://reference/scripting-languages/painless/painless-operators-boolean.md#boolean-or-operator) (`||`), you need to use the null safe operator for both conditions being checked.
+
+For example, if you want to include data when the value of the `ctx.event.type` field is either `null` or `'0'`:
+
+####  **Don't**: Leave the conditions vulnerable to failures
+
+```painless
+ctx.event.type == null || ctx.event.type == '0' <1>
+```
+
+1. This will fail if `ctx.event` is not properly set because it assumes that `ctx.event` exists. If it fails on the first condition it won't even try the second condition.
+
+####  **Do**: Use the null safe operator in both conditions
+
+```painless
+"if": "ctx.event?.type == null || ctx.event?.type == '0'"
+```
+
+1. Both conditions will be checked.
+
+### Avoiding Redundant Null Checks
+
+It is often unnecessary to use the `?` (null safe operator) multiple times when you have already traversed the object path.
+
+####  **Don't**: Use redundant null safe operators
+
+```painless
+"if": "ctx.arbor?.ddos?.subsystem == 'CLI' && ctx.arbor?.ddos?.command_line != null"
+```
+
+####  **Do**: Use the null safe operator only where needed
+
+Since the `if` condition is evaluated left to right, once `ctx.arbor?.ddos?.subsystem == 'CLI'` passes, you know `ctx.arbor.ddos` exists. You can safely omit the second `?`.
+
+```painless
+"if": "ctx.arbor?.ddos?.subsystem == 'CLI' && ctx.arbor.ddos.command_line != null"
+```
+
+This improves readability and avoids redundant checks.
+
+### Checking for emptiness
+
+When checking if a field is not empty, avoid redundant null safe operators and use clear, concise conditions.
+
+####  **Don't**: Use redundant null safe operators
+
+```painless
+"if": "ctx?.user?.geo?.region != null && ctx?.user?.geo?.region != ''"
+```
+
+####  **Do**: Use the null safe operator only where needed
+
+Once you've checked `ctx.user?.geo?.region != null`, you can safely access `ctx.user.geo.region` in the next condition.
+
+```painless
+"if": "ctx.user?.geo?.region != null && ctx.user.geo.region != ''"
+```
+
+####  **Do**: Use `.isEmpty()` for strings
+
+% TO DO: Find link to `isEmpty()` method
+To check if a string field is not empty, use the `isEmpty()` method in your condition. For example:
+
+```painless
+"if": "ctx.user?.geo?.region instanceof String && ctx.user.geo.region.isEmpty() == false"
+```
+
+This ensures the field exists, is a string, and is not empty.
+
+:::{tip}
+For such checks you can also ommit the `instanceof String` and use an [`Elvis`](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#elvis-operator) such as `if: ctx.user?.geo?.region?.isEmpty() ?: false`. This will only work when region is a String. If it is a double, object or any other type that does not have an `isEmpty()`function it will fail with a `Java Function not found` error.
+:::
+
+Here is a full reproducible example:
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "user": {
+ "geo": {
+ "region": "123"
+ }
+ }
+ }
+ },
+ {
+ "_source": {
+ "user": {
+ "geo": {
+ "region": ""
+ }
+ }
+ }
+ },
+ {
+ "_source": {
+ "user": {
+ "geo": {
+ "region": null
+ }
+ }
+ }
+ },
+ {
+ "_source": {
+ "user": {
+ "geo": null
+ }
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "set": {
+ "field": "demo",
+ "value": true,
+ "if": "if": "ctx.user?.geo?.region != null && ctx.user.geo.region != ''"
+ }
+ }
+ ]
+ }
+}
+```
+
+## Converting mb/gb values to bytes
+
+When working with data sizes, it's best practice to store all values as bytes (using a `long` type) in Elasticsearch. This ensures consistency and allows you to leverage advanced formatting in Kibana Data Views to display human-readable sizes.
+
+###  **Don't**: Use multiple `gsub` processors for unit conversion
+
+Avoid chaining several `gsub` processors to strip units and manually convert values. This approach is error-prone, hard to maintain, and can easily miss edge cases.
+
+```json
+{
+ "gsub": {
+ "field": "document.size",
+ "pattern": "M",
+ "replacement": "",
+ "ignore_missing": true,
+ "if": "ctx?.document?.size != null && ctx.document.size.endsWith(\"M\")"
+ }
+},
+{
+ "gsub": {
+ "field": "document.size",
+ "pattern": "(\\d+)\\.(\\d+)G",
+ "replacement": "$1$200",
+ "ignore_missing": true,
+ "if": "ctx?.uws?.size != null && ctx.document.size.endsWith(\"G\")"
+ }
+},
+{
+ "gsub": {
+ "field": "document.size",
+ "pattern": "G",
+ "replacement": "000",
+ "ignore_missing": true,
+ "if": "ctx?.uws?.size != null && ctx.document.size.endsWith(\"G\")"
+ }
+}
+```
+
+###  **Do**: Use the `bytes` processor for automatic conversion
+
+The [`bytes` processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/bytes-processor.html) automatically parses and converts strings like `"100M"` or `"2.5GB"` into their byte values. This is more reliable, easier to maintain, and supports a wide range of units.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "document": {
+ "size": "100M"
+ }
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "bytes": {
+ "field": "document.size"
+ }
+ }
+ ]
+ }
+}
+```
+
+:::{tip}
+After storing values as bytes, you can use Kibana's field formatting to display them in a human-friendly format (KB, MB, GB, etc.) without changing the underlying data.
+:::
+
+## Rename processor
+
+The rename processor renames a field. There are two flags:
+
+- ignore_missing
+- ignore_failure
+
+Ignore missing is useful when you are not sure that the field you want to rename from exist. Ignore_failure will help you with any failure encountered. The rename processor can only rename to non-existing fields. If you already have the field `abc` and you want to rename `def` to `abc` then the operation fails. The `ignore_failure` helps you in this case.
+
+## Script processor
+
+Sometimes, you may need to use a script processor in your ingest pipeline when no built-in processor can achieve your goal. However, it's important to write scripts that are clear, concise, and maintainable.
+
+### Calculating `event.duration` in a complex manner
+
+####  **Don't**: Use verbose and error-prone scripting patterns
+
+The following example demonstrates several common mistakes:
+
+- Accessing fields using square brackets (e.g., `ctx['temp']['duration']`) instead of dot notation.
+- Using multiple `!= null` checks instead of the null safe operator (`?.`).
+- Parsing substrings manually instead of leveraging date/time parsing utilities.
+- Accessing `event.duration` directly without ensuring `event` exists.
+- Calculating `event.duration` in milliseconds, when it should be in nanoseconds.
+
+```json
+{
+ "script": {
+ "source": """
+ String timeString = ctx['temp']['duration'];
+ ctx['event']['duration'] = Integer.parseInt(timeString.substring(0,2))*360000 + Integer.parseInt(timeString.substring(3,5))*60000 + Integer.parseInt(timeString.substring(6,8))*1000 + Integer.parseInt(timeString.substring(9,12));
+ """,
+ "if": "ctx.temp != null && ctx.temp.duration != null"
+ }
+}
+```
+
+This approach is hard to read, error-prone, and doesn't take advantage of the powerful date/time features available in Painless.
+
+####  **Do**: Use null safe operators and built-in date/time utilities
+
+A better approach is to:
+
+- Use the null safe operator to check for field existence.
+- Ensure the `event` object exists before assigning to it.
+- Use `DateTimeFormatter` and `LocalTime` to parse the duration string.
+- Store the duration in nanoseconds, as expected by ECS.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "temp": {
+ "duration": "00:00:06.448"
+ }
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "script": {
+ "source": """
+ if (ctx.event == null) {
+ ctx.event = [:];
+ }
+ DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss.SSS");
+ LocalTime time = LocalTime.parse(ctx.temp.duration, formatter);
+ ctx.event.duration = time.toNanoOfDay();
+ """,
+ "if": "ctx.temp?.duration != null"
+ }
+ }
+ ]
+ }
+}
+```
+
+This version is more robust, easier to maintain, and leverages the full capabilities of Painless and Java's date/time APIs. Always prefer built-in utilities and concise, readable code when writing script processors.
+
+## Stitching together IP addresses in a script processor
+
+When reconstructing or normalizing IP addresses in ingest pipelines, avoid unnecessary complexity and redundant operations.
+
+###  **Don't**: Use verbose and error-prone scripting patterns
+
+- No check if `destination` is available as an object.
+- Uses square bracket notation for field access.
+- Unnecessary casting to `Integer` when parsing string segments.
+- Allocates an extra variable for the IP string instead of setting the field directly.
+
+```json
+{
+ "script": {
+ "source": """
+ String[] ipSplit = ctx['destination']['ip'].splitOnToken('.');
+ String ip = Integer.parseInt(ipSplit[0]) + '.' + Integer.parseInt(ipSplit[1]) + '.' + Integer.parseInt(ipSplit[2]) + '.' + Integer.parseInt(ipSplit[3]);
+ ctx['destination']['ip'] = ip;
+ """,
+ "if": "(ctx['destination'] != null) && (ctx['destination']['ip'] != null)"
+ }
+}
+```
+
+###  **Do**: Use concise, readable, and safe scripting
+
+- Use dot notation for field access.
+- Use the null safe operator (`?.`) to check for field existence.
+- Avoid unnecessary casting and extra variables.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "destination": {
+ "ip": "192.168.0.1.3.4.5.6.4"
+ }
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "script": {
+ "source": """
+ def temp = ctx.destination.ip.splitOnToken('.');
+ ctx.destination.ip = temp[0] + "." + temp[1] + "." + temp[2] + "." + temp[3];
+ """,
+ "if": "ctx.destination?.ip != null"
+ }
+ }
+ ]
+ }
+}
+```
+
+This approach is more maintainable, avoids unnecessary operations, and ensures your pipeline scripts are robust and easy to understand.
+
+##  **Don't**: Remove `@timestamp` before using the date processor
+
+It's a common mistake to explicitly remove the `@timestamp` field before running a date processor, as shown below:
+
+```json
+{
+ "set": {
+ "field": "openshift.timestamp",
+ "value": "{{openshift.date}} {{openshift.time}}",
+ "if": "ctx?.openshift?.date != null && ctx?.openshift?.time != null && ctx?.openshift?.timestamp == null"
+ }
+},
+{
+ "remove": {
+ "field": "@timestamp",
+ "ignore_missing": true,
+ "if": "ctx?.openshift?.timestamp != null || ctx?.openshift?.timestamp1 != null"
+ }
+},
+{
+ "date": {
+ "field": "openshift.timestamp",
+ "formats": [
+ "yyyy-MM-dd HH:mm:ss",
+ "ISO8601"
+ ],
+ "timezone": "Europe/Vienna",
+ "if": "ctx?.openshift?.timestamp != null"
+ }
+}
+```
+
+This removal step is unnecessary and can even be counterproductive. The `date` processor will automatically overwrite the value in `@timestamp` with the parsed date from your source field, unless you explicitly set a different `target_field`. There's no need to remove `@timestamp` beforehand—the processor will handle updating it for you.
+
+Removing `@timestamp` can also introduce subtle bugs, especially if the date processor is skipped or fails, leaving your document without a timestamp.
+
+## Mustache tips and tricks
+
+Mustache is a simple templating language used in Elasticsearch ingest pipelines to dynamically insert field values into strings. You can use double curly braces (`{{ }}`) to reference fields from your document, enabling flexible and dynamic value assignment in processors like `set`, `rename`, and others.
+
+For example, `{{host.hostname}}` will be replaced with the value of the `host.hostname` field at runtime. Mustache supports accessing nested fields, arrays, and even provides some basic logic for conditional rendering.
+
+### Accessing values in an array
+
+When you need to reference a specific element in an array using Mustache templates, you can use dot notation with the zero-based index. For example, to access the first value in the `tags` array, use `.0` after the array field name.
+
+####  **Do**: Use array index notation in Mustache templates
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "host": {
+ "hostname": "abc"
+ },
+ "tags": [
+ "cool-host"
+ ]
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "set": {
+ "field": "host.alias",
+ "value": "{{tags.0}}"
+ }
+ }
+ ]
+ }
+}
+```
+
+In this example, `{{tags.0}}` retrieves the first element of the `tags` array (`"cool-host"`) and assigns it to the `host.alias` field. This approach is necessary when you want to extract a specific value from an array for use elsewhere in your document. Using the correct index ensures you get the intended value, and this pattern works for any array field in your source data.
diff --git a/manage-data/ingest/transform-enrich/error-handling.md b/manage-data/ingest/transform-enrich/error-handling.md
new file mode 100644
index 000000000..448325f41
--- /dev/null
+++ b/manage-data/ingest/transform-enrich/error-handling.md
@@ -0,0 +1,153 @@
+---
+mapped_pages:
+ - https://www.elastic.co/docs/manage-data/ingest/transform-enrich/error-handling.html
+applies_to:
+ stack: ga
+ serverless: ga
+---
+
+# Error handling
+
+Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively.
+
+**Important**: Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.
+
+Errors in ingest pipelines typically fall into the following categories:
+
+- Parsing Errors: Occur when a processor fails to parse a field, such as a date or number.
+- Missing Fields: Happen when a required field is absent in the document.
+
+:::tip
+Create an `error-handling-pipeline` that sets `event.kind` to `pipeline_error` and stores the error message, along with the tag from the failed processor, in the `error.message` field. Including a tag is especially helpful when using multiple `grok`, `dissect`, or `script` processors, as it helps identify which one caused the failure.
+:::
+
+The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline.
+
+## Global vs. Processor-Specific
+
+The following example demonstrates how to use the `on_failure` handler at the pipeline level rather than within individual processors. While this approach ensures the pipeline exits gracefully on failure, it also means that processing stops at the point of error.
+
+In this example, a typo was made in the configuration of the `dissect` processor intended to extract `user.name` from the message. A comma (`,`) was used instead of the correct colon (`:`).
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "@timestamp": "2025-04-03T10:00:00.000Z",
+ "message": "user: philipp has logged in"
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "dissect": {
+ "field": "message",
+ "pattern": "%{}, %{user.name} %{}",
+ "tag": "dissect for user.name"
+ }
+ },
+ {
+ "append": {
+ "field": "event.category",
+ "value": "authentication"
+ }
+ }
+ ],
+ "on_failure": [
+ {
+ "set": {
+ "field": "event.kind",
+ "value": "pipeline_error"
+ }
+ },
+ {
+ "append": {
+ "field": "error.message",
+ "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
+ }
+ }
+ ]
+ }
+}
+```
+
+The second processor, which sets `event.category` to `authentication`, is no longer executed because the first `dissect` processor fails and triggers the global `on_failure` handler. The resulting document shows which processor caused the error, the pattern it attempted to apply, and the input it received.
+
+```json
+"@timestamp": "2025-04-03T10:00:00.000Z",
+"message": "user: philipp has logged in",
+"event": {
+ "kind": "pipeline_error"
+},
+"error": {
+ "message": "Processor dissect with tag dissect for user.name in pipeline _simulate_pipeline failed with message: Unable to find match for dissect pattern: %{}, %{user.name} %{} against source: user: philipp has logged in"
+}
+```
+
+We can restructure the pipeline by moving the `on_failure` handling directly into the processor itself. This allows the pipeline to continue execution. In this case, the `event.category` processor still runs. You can also retain the global `on_failure` to handle errors from other processors, while adding processor-specific error handling where needed.
+
+(While executing two `set` processors within the `dissect` error handler may not always be ideal, it serves as a demonstration.)
+
+For the `dissect` processor, consider setting a temporary field like `_tmp.error: dissect_failure`. You can then use `if` conditions in later processors to execute them only if parsing failed, allowing for more controlled and flexible error handling.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "@timestamp": "2025-04-03T10:00:00.000Z",
+ "message": "user: philipp has logged in"
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "dissect": {
+ "field": "message",
+ "pattern": "%{}, %{user.name} %{}",
+ "on_failure": [
+ {
+ "set": {
+ "field": "event.kind",
+ "value": "pipeline_error"
+ }
+ },
+ {
+ "append": {
+ "field": "error.message",
+ "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
+ }
+ }
+ ],
+ "tag": "dissect for user.name"
+ }
+ },
+ {
+ "append": {
+ "field": "event.category",
+ "value": "authentication"
+ }
+ }
+ ],
+ "on_failure": [
+ {
+ "set": {
+ "field": "event.kind",
+ "value": "pipeline_error"
+ }
+ },
+ {
+ "set": {
+ "field": "error.message",
+ "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"
+ }
+ }
+ ]
+ }
+}
+```
diff --git a/manage-data/ingest/transform-enrich/general-tips.md b/manage-data/ingest/transform-enrich/general-tips.md
new file mode 100644
index 000000000..07f66364d
--- /dev/null
+++ b/manage-data/ingest/transform-enrich/general-tips.md
@@ -0,0 +1,463 @@
+---
+mapped_pages:
+ - https://www.elastic.co/docs/manage-data/ingest/transform-enrich/general-tips-and-tricks.html
+applies_to:
+ stack: ga
+ serverless: ga
+---
+
+# Tips and Tricks
+
+There are various ways to handle data in ingest pipelines, and while they all produce similar results, some methods might be more suitable depending on the specific case. This section provides guidance to ensure that your ingest pipelines are consistent, readable, and maintainable. While we won't focus heavily on performance optimizations, the goal is to create pipelines that are easy to understand and manage.
+
+## Accessing Fields in `if` Statements
+
+In an ingest pipeline, when working with `if` statements inside processors, you can access fields in two ways:
+
+- Dot notation
+- Square bracket notation
+
+For example:
+
+- `ctx.event.action`
+
+is equivalent to:
+
+- `ctx['event']['action']`
+
+Both notations can be used to reference fields, so choose the one that makes your pipeline easier to read and maintain.
+
+### Downsides of brackets
+
+- No support for null safety operations `?`
+
+### When to use brackets
+
+When you have special characters such as `@` in the field name, or a `.` in the field name. As an example:
+
+- field_name: `demo.cool.stuff`
+
+using:
+
+`ctx.demo.cool.stuff` it would try to access the field `stuff` in the object `cool` in the object `demo`.
+
+using:
+
+`ctx['demo.cool.stuff']` it can access the field directly.
+
+You can also mix and match both worlds when needed:
+
+- field_name: `my.nested.object.has@!%&chars`
+
+Proper way: `ctx.my.nested.object['has@!%&chars']`
+
+You can even, partially use the `?` operator:
+
+- `ctx.my?.nested?.object['has@!%&chars']`
+
+But it will error if object is `null`. To be a 100% on the safe side you need to write the following statement:
+
+- `ctx.my?.nested?.object != null && ctx.my.nested.object['has@!%&chars'] == ...`
+
+## Accessing fields in a script
+
+Within a script there are the same two possibilities to access fields as above. As well as the new `getter`. This only works in the painless scripts in an ingest pipeline. Take the following input:
+
+```json
+{
+ "_source": {
+ "user_name": "philipp"
+ }
+}
+```
+
+When you want to set the `user.name` field with a script:
+
+- `ctx.user.name = ctx.user_name`
+
+This works as long as `user_name` is populated. If it is null, you get null as value for user.name. Additionally, when the `user` object does not exist, it will error because Java needs you to define the `user` object first before adding a key `name` into it.
+
+This is one of the alternatives to get it working when you only want to set it, if it is not null
+
+```painless
+if (ctx.user_name != null) {
+ ctx.user.name = ctx.user_name
+}
+```
+
+This works fine, as you now check for null.
+
+However there is also an easier to write and maintain alternative available:
+
+- `ctx.user.name = $('user_name', null);`
+
+This $('field', 'fallback') allows you to specify the field without the `CTX` for walking. You can even supply `$('this.very.nested.field.is.super.far.away', null)` when you need to. The fallback is in case the field is null. This comes in very handy when needing to do certain manipulation of data. Let's say you want to lowercase all the field names, you can simply write this now:
+
+- `ctx.user.name = $('user_name','').toLowerCase();`
+
+You see that I switched up the null value to an empty String. Since the String has the `toLowerCase()` function. This of course works with all types. Bit of a silly thing, since you could simply write `object.abc` as the field value. As an example you can see that we can even create a map, list, array, whatever you want.
+
+- `if ($('object', {}).containsKey('abc')){}`
+
+One common thing I use it for is when dealing with numbers and casting. The field specifies the usage in `%`, however Elasticsearch doesn't like this, or better to say Kibana renders % as `0-1` for `0%-100%` and not `0-100`. `100` is equal to `10.000%`
+
+- field: `cpu_usage = 100.00`
+- `ctx.cpu.usage = $('cpu_usage',0.0)/100`
+
+This allows me to always set the `cpu.usage` field and not to worry about it, have an always working division. One other way to leverage this, in a simpler script is like this, but most scripts are rather complex so this is not that often applicable.
+
+```json
+{
+ "script": {
+ "source": "ctx.abc = ctx.def"
+ "if": "ctx.def != null"
+ }
+}
+```
+
+## Check if a value exists and is not null
+
+In simplest case the `ignore_empty_value` parameter is available in most processors to handle fields without values. Or the `ignore_failure` parameter to let the processor fail without impacting the pipeline you but sometime you will need to use the [null safe operator `?.`](elasticsearch://reference/scripting-languages/painless/painless-operators-reference.md#null-safe-operator) to check if a field exists and is not `null`.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "host": {
+ "hostname": "test"
+ },
+ "ip": "127.0.0.1"
+ }
+ },
+ {
+ "_source": {
+ "ip": "127.0.0.1"
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "set": {
+ "field": "a",
+ "value": "b",
+ "if": "ctx.host?.hostname == 'test'"
+ }
+ }
+ ]
+ }
+}
+```
+
+This pipeline will work in both cases because `host?` checks if `host` exists and if not returns `null`. Removing `?` from the `if` condition will fail the second document with an error message: `cannot access method/field [hostname] from a null def reference`
+
+The null operator `?` is actually doing this behind the scene:
+
+Imagine you write this:
+
+- `ctx.windows?.event?.data?.user?.name == "philipp"`
+
+Then the ? will transform this simple if statement to this:
+
+```painless
+ctx.windows != null &&
+ctx.windows.event != null &&
+ctx.windows.event.data != null &&
+ctx.windows.event.data.user != null &&
+ctx.windows.event.data.user.name == "philipp"
+```
+
+You can use the null safe operator with function too:
+
+- `ctx.message?.startsWith('shoe')`
+
+An [elvis](https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-operators-reference.html#elvis-operator) might be useful in your script to handle these maybe null value:
+
+- `ctx.message?.startsWith('shoe') ?: false`
+
+Most safest and secure option is to write:
+
+- `ctx.message instanceof String && ctx.message.startsWith('shoe')`
+- `ctx.event?.category instanceof String && ctx.event.category.startsWith('shoe')`
+
+The reason for that is, if `event.category` is a number, object or anything other than a `String` then it does not have the `startsWith` function and therefore will error with function `startsWith` not available on type object.
+
+## Check if a key is in a document
+
+The `containsKey` can be used to check if a map contains a specific key.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "ip": "127.0.0.1"
+ }
+ },
+ {
+ "_source": {
+ "test": "asd"
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "set": {
+ "field": "a",
+ "value": "b",
+ "if": "ctx.containsKey('test')"
+ }
+ }
+ ]
+ }
+}
+```
+
+## Remove empty fields or remove empty fields that match a regular expression
+
+Alex and Honza created a [blog post](https://alexmarquardt.com/2020/11/06/using-elasticsearch-painless-scripting-to-iterate-through-fields/) presenting painless scripts that remove empty fields or fields that match a regular expression. We are already using this in a lot of places. Most of the time in the custom pipeline and in the final pipeline as well.
+
+```json
+POST _ingest/pipeline/remove_empty_fields/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "key1": "first value",
+ "key2": "some other value",
+ "key3": "",
+ "sudoc": {
+ "a": "abc",
+ "b": ""
+ }
+ }
+ },
+ {
+ "_source": {
+ "key1": "",
+ "key2": "some other value",
+ "list_of_docs": [
+ {
+ "foo": "abc",
+ "bar": ""
+ },
+ {
+ "baz": "",
+ "subdoc_in_list": {"child1": "xxx", "child2": ""}
+ }
+ ]
+ }
+ }
+ ]
+}
+```
+
+```json
+PUT _ingest/pipeline/remove_unwanted_keys
+{
+ "processors": [
+ {
+ "script": {
+ "lang": "painless",
+ "source": """
+ void iterateAllFields(def x) {
+ if (x instanceof List) {
+ for (def v: x) {
+ iterateAllFields(v);
+ }
+ }
+ if (!(x instanceof Map)) {
+ return;
+ }
+ x.entrySet().removeIf(e -> e.getKey() =~ /unwanted_key_.*/);
+// You can also add more lines here. Like checking for emptiness, or null directly.
+ x.entrySet().removeIf(e -> e.getKey() == "");
+ x.entrySet().removeIf(e -> e.getKey() == null);
+ for (def v: x.values()) {
+ iterateAllFields(v);
+ }
+ }
+ iterateAllFields(ctx);
+ """
+ }
+ }
+ ]
+}
+```
+
+## Type check of fields in ingest pipelines
+
+If it is required to check the type of a field, this can be done via the Painless method instanceof
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "winlog": {
+ "event_data": {
+ "param1": "hello"
+ }
+ }
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "rename": {
+ "field": "winlog.event_data.param1",
+ "target_field": "SysEvent1",
+ "if": "ctx.winlog?.event_data?.param1 instanceof String"
+ }
+ }
+ ]
+ }
+}
+```
+
+Yes the `instanceof` also works with the `?` operator.
+
+## Calculate time in other timezone
+
+When you cannot use the date and its timezone parameter, you can use `datetime` in Painless
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "@timestamp": "2021-08-13T09:06:00.000Z"
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "script": {
+ "source": """
+ ZonedDateTime zdt = ZonedDateTime.parse(ctx['@timestamp']);
+ ZonedDateTime zdt_local = zdt.withZoneSameInstant(ZoneId.of('Europe/Berlin'));
+ DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd.MM.yyyy - HH:mm:ss Z");
+ ctx.localtime = zdt_local.format(formatter);
+ """
+ }
+ }
+ ]
+ }
+}
+```
+
+## Work with JSON as value of fields
+
+It is possible to work with json string as value of a field for example to set the `original` field value with the json of `_source`: We are leveraging a `mustache` function here.
+
+```json
+POST _ingest/pipeline/_simulate
+{
+ "docs": [
+ {
+ "_source": {
+ "foo": "bar",
+ "key": 123
+ }
+ }
+ ],
+ "pipeline": {
+ "processors": [
+ {
+ "set": {
+ "field": "original",
+ "value": "{{#toJson}}_source{{/toJson}}"
+ }
+ }
+ ]
+ }
+}
+```
+
+## Script Processor
+
+### Setting the value of a field
+
+Sometimes it is needed to write to a field and this field does not exist yet. Whenever the object above it exists, this can be done immediately.
+
+`ctx.abc = “cool”` works without any issue as we are adding a root field called `abc`.
+
+Creating something like `ctx.abc.def = “cool”` does not work unless you create the `abc` object beforehand or it already exists. There are multiple ways to do it. What we always or usually want to create is a Map. We can do it in a couple of ways:
+
+```painless
+ctx.abc = new HashMap();
+ctx.abc = [:];
+```
+
+Both options are valid and do the same thing. However there is a big caveat and that is, that if `abc` already exists, it will be overwritten and empty. Validating if `abc` already exists can be done by:
+
+```painless
+if(ctx.abc == null) {
+ ctx.abc = [:];
+}
+```
+
+With a simple `if ctx.abc == null` we know that `abc` does not exist and we can create it. Alternatively you can use the shorthand which is super helpful when you need to go 2,3,4 levels deep. You can use either version with the `HashMap()` or with the `[:]`.
+
+```painless
+ctx.putIfAbsent("abc", new HashMap());
+ctx.putIfAbsent("abc", [:]);
+```
+
+Now assuming you want to create this structure:
+
+```json
+{
+ "user": {
+ "geo": {
+ "city": "Amsterdam"
+ }
+ }
+}
+```
+
+The `putIfAbsent` will help a ton here:
+
+```painless
+ctx.putIfAbsent("user", [:]);
+ctx.user.putIfAbsent("geo", [:]);
+ctx.user.geo = "Amsterdam"
+```
+
+## Remove fields based on their values
+
+When `ignore_malformed`, null or other mapping parameters are not sufficient, you can use a script like this:
+
+```yaml
+- script:
+ lang: painless
+ params:
+ values:
+ - ""
+ - "-"
+ - "N/A"
+ source: >-
+ ctx?.sophos?.xg.entrySet().removeIf(entry -> params.values.contains(entry.getValue()));
+```
+
+## GROK vs Dissects
+
+There can be a very long discussion on whether to choose GROK or dissects. When to choose what, depends on a lot of factors and existing knowledge. Dissects are easier to understand and follow, but are limited in their use. The log should look fairly the same all the times, as opposed to grok which can deal with a lot of different tasks, like optional fields in various positions.
+
+I can only go as far as telling what I like to do, which might not be the best on performance, but definitely the easiest to read and maintain. A log source often has many diverse messages and you might only need to extract certain information that is always on the same position, for example this message
+
+```text
+2018-08-14T14:30:02.203151+02:00 linux-sqrz systemd[4179]: Stopped target Basic System.
+```
+
+With a dissect we can simply do `%{_tmp.date} %{host.hostname} %{process.name}[%{process.pid}]: %{message}` and call it a day. Now we have extracted the most important information, like the timestamp. If you extract first to `_tmp.date` or directly over `@timestamp` is a discussion for another chapter.
+
+With that extracted we are left with the `message` field to gather information, like user logins etc.
diff --git a/manage-data/toc.yml b/manage-data/toc.yml
index 9275294ab..e81835203 100644
--- a/manage-data/toc.yml
+++ b/manage-data/toc.yml
@@ -98,6 +98,9 @@ toc:
- file: ingest/transform-enrich/ingest-pipelines.md
children:
- file: ingest/transform-enrich/example-parse-logs.md
+ - file: ingest/transform-enrich/common-mistakes.md
+ - file: ingest/transform-enrich/error-handling.md
+ - file: ingest/transform-enrich/general-tips.md
- file: ingest/transform-enrich/logstash-pipelines.md
- file: ingest/transform-enrich/data-enrichment.md
children: