Skip to content

[exporter/elasticsearch] Enable dynamic routing (previously {logs,metrics,traces}_dynamic_index) by default #38361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carsonip opened this issue Mar 4, 2025 · 12 comments · Fixed by #38500
Labels

Comments

@carsonip
Copy link
Contributor

carsonip commented Mar 4, 2025

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

config metrics_dynamic_index::enabled defaults to true while logs_dynamic_index and traces_dynamic_index are not.
Dynamic document routing should be enabled by default for all of them for consistency and better user experience out of the box.

Describe the solution you'd like

Overhaul of dynamic document routing such that the behavior similar to {logs,metrics,traces}_dynamic_index is the default

Describe alternatives you've considered

Change the default to true

Additional context

No response

@carsonip carsonip added enhancement New feature or request needs triage New item requiring triage labels Mar 4, 2025
@carsonip
Copy link
Contributor Author

carsonip commented Mar 4, 2025

/label -needs-triage

@github-actions github-actions bot added exporter/elasticsearch and removed needs triage New item requiring triage labels Mar 4, 2025
Copy link
Contributor

github-actions bot commented Mar 4, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@carsonip
Copy link
Contributor Author

carsonip commented Mar 5, 2025

@andrzej-stencel raised a concern that flipping default dynamic index to true causes logs_index and traces_index to be ignored. logs_index and traces_index are only respected when dynamic index is enabled AND (elasticsearch.index.prefix OR elasticsearch.index.suffix) are present.

TBH the data routing conditions are unnecessarily complicated now and I'm not sure if it even addresses valid use cases. We may need to make a bigger change as to how dynamic index works, and possibly splitting some configs.

In any case, we need to limit the impact of this change to users who rely on logs_index and traces_index.

@carsonip
Copy link
Contributor Author

carsonip commented Mar 7, 2025

Motivation 1

*_dynamic_index::enabled should default to true because

  1. to align with otel as new default mapping mode: [exporter/elasticsearch] Change default mapping mode to otel #37241 , as otel mapping mode works best with sending to *-*.otel-* such that otel-data plugin from 8.16 can be used and passthrough fields will work; and
  2. to make logs_dynamic_index and traces_dynamic_index (both have enabled default to false) to be consistent with metrics_dynamic_index (enabled default to true)

Impact / side effect of switching defaults directly

  1. logs_index and traces_index will be ignored in most cases, as *_dynamic_index follows an necessarily complex decision tree to determine index / data stream name (aka DS routing). i.e. users who rely on logs_index and traces_index will suddenly stop seeing their data at logs_index and traces_index if logs_dynamic_index::enabled and traces_dynamic_index::enabled becomes true.

How to mitigate the impact

  • In config validation, we can enforce that logs_index cannot be set at the same time as *_dynamic_index.
    • The problem with this mitigation is that a part of *_dynamic_index DS routing actually depends on *_index. It is the "legacy" part about elasticsearch.index.prefix and elasticsearch.index.suffix, which brings us to the 2nd motivation

Motivation 2

DS routing is unnecessarily complex at the moment as mentioned. The elasticsearch.index.{prefix,suffix} should not the recommended way to perform routing. Also, annoyingly, it uses a reverse precedence order compared to DS routing based on data_stream.* attributes. We should separate this "legacy" functionality from the preferred way of DS routing.

Be opinionated and defaults that deliver good UX OOTB.

Proposed solution

It comes in a few parts

  1. Separate the old prefix + *_index + suffix DS routing to a separate config e.g. *_dynamic_index_legacy::enabled or a better name. This can be a deprecated config from the time of introduction.
  2. Set *_dynamic_index::enabled to true
  3. Add validation to forbid setting *_index and *_dynamic_index::enabled: true at the same time.
  4. (nice to have) Use cases where people are trying to set a default index via *_index would not be able to do the same if *_dynamic_index::enabled: true, as the default will always be *-generic-default. The workaround is to use a transform processor in the pipeline to add data_stream.* attributes. It may be more ergonomic to add config *_dynamic_index::default_dataset and *_dynamic_index::default_namespace to provide a smooth migration path for these use cases where *_index is a valid ECS compliant DS name.

@axw @felixbarny wdyt?

@carsonip
Copy link
Contributor Author

carsonip commented Mar 7, 2025

Created draft PR #38456 to demonstrate the above proposal

@axw
Copy link
Contributor

axw commented Mar 10, 2025

@carsonip that sounds like an improvement, but it still seems a little complicated to me. What about this alternative?

  1. Deprecate *_dynamic_index
  2. Change the order of the legacy prefix/suffix matching to the same as data_stream.* order
  3. Add templating support to {logs,metrics,traces}_index

Then we would end up configuring the exporter like follows, by default. I'll use text/template syntax for illustration, but maybe we should look at using OTTL.

LogsIndex: '{{ or (attr "data_stream.type") "logs" }}-{{ or (attr "data_stream.dataset") "generic.otel" }}-{{ or (attr "data_stream.namespace") "default" }}'

attr would be a function/converter we inject that looks up an attribute with the given name. We would need to select the default based on the mapping mode, since "generic.otel" would only be correct for OTel-native mode. I suppose we would need to inject mapping_mode as another template function, since the mode is now dynamic.

Anyway, point being that the user could then configure a static index or dynamic index based on the lack or presence of variables. WDYT?

@felixbarny
Copy link
Contributor

Another idea to simplify the configuration:

Logic to determine the _index:

  • Static value if configured {logs,metrics,traces}_index
  • Value of attributes["elasticsearch._index"]
  • Value of attributes["data_stream.*"] with default values (<type>-generic(.otel)-default) and receiver-based routing

Use cases that were using prefix/suffix before would use OTTL before the exporter to set elasticsearch._index.
This doesn't require any config options, maybe aside from being able to turn off the dynamic index evaluation. Also, no templating of the index in the ES exporter needed.

@carsonip
Copy link
Contributor Author

Add templating support to {logs,metrics,traces}_index

Value of attributes["elasticsearch._index"]

I see a lot of value in a new elasticsearch._index attribute handling. It provides a good migration path from prefix/suffix and is even more ergonomic to use if an advanced user wants to dynamically configure index. All of that without maintaining parsing and templating in es exporter. This should be good enough for now (YAGNI), and we can still add the templating later if we need it.

While {logs,metrics,traces}_index covers the basic and legacy use case and "attributes["data_stream.*"] with default values (-generic(.otel)-default) and receiver-based routing" covers the recommended use case.

What I'm going to do

I'll go ahead and create a draft PR from @felixbarny 's proposal.

  1. Deprecate and make {logs,metrics,traces}_dynamic_index config a no-op. Log a warning if used.
  2. Remove any prefix/suffix handling
  3. Update to always dynamically route base on {logs,metrics,traces}_index, attributes["elasticsearch._index"], attributes["data_stream.*"] and receiver. Attributes precedence: record > scope > resource

Remaining questions

  1. How do we set data_stream.* fields in the doc if data_stream.* attrs are not present? e.g. the cases {logs,metrics,traces}_index and attributes["elasticsearch._index"]. Should we attempt to parse the attribute / config and add back data_stream.* without knowledge of whether the target is a data stream or a regular index?

@carsonip
Copy link
Contributor Author

Remaining questions (continued)

Span events are stored in separate documents. They will be routed with data_stream.type set to
logs if traces_dynamic_index::enabled is true.

  • We now have a special handling for traces dynamic index to route span events to logs. It'll be fine with attributes["data_stream.*"] and receiver based routing. But for
    • attributes["elasticsearch._index"]: we can use elasticsearch._index as is, assuming advanced users know where to route the doc best. We will NOT route span events to logs.
    • traces_index: It will be slightly breaking for users who now set traces_index and traces_dynamic_index, but that's not a very good combination anyway. We can just send span events to traces_index without rewriting DS type.

@felixbarny
Copy link
Contributor

How do we set data_stream.* fields in the doc if data_stream.* attrs are not present? e.g. the cases {logs,metrics,traces}_index and attributes["elasticsearch._index"]. Should we attempt to parse the attribute / config and add back data_stream.* without knowledge of whether the target is a data stream or a regular index?

I think we shouldn't automatically set data_stream fields if elasticsearch._index is present. If it's a data stream, users should use data_stream.* attributes preferably to do routing rather than setting an elasticsearch._index attribute.

we can use elasticsearch._index as is, assuming advanced users know where to route the doc best. We will NOT route span events to logs.

I'd say routing of span events should be controlled via span event attributes rather than span attributes. That way, setting attributes["elasticsearch._index"] for spans doesn't influence the routing for related span events.

We can just send span events to traces_index without rewriting DS type.

👍

@carsonip
Copy link
Contributor Author

I think we shouldn't automatically set data_stream fields if elasticsearch._index is present. If it's a data stream, users should use data_stream.* attributes preferably to do routing rather than setting an elasticsearch._index attribute.

Makes sense

I'd say routing of span events should be controlled via span event attributes rather than span attributes. That way, setting attributes["elasticsearch._index"] for spans doesn't influence the routing for related span events.

++, I'll double check if that's the case.

@carsonip
Copy link
Contributor Author

carsonip commented Mar 10, 2025

I'll go ahead and create a draft PR from @felixbarny 's proposal.

Created PR #38500

++, I'll double check if that's the case.

The behavior is correct and expected. Added a test in #38500 to confirm that

@carsonip carsonip changed the title [exporter/elasticsearch] Change default all *_dynamic_index::enabled config to true [exporter/elasticsearch] Enable dynamic routing (previously {logs,metrics,traces}_dynamic_index) by default Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment