Skip to content

[exporter/kafkaexporter] Enable Partitioning by Specific Attribute for Logs & Metrics #38484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danielkatz opened this issue Mar 9, 2025 · 7 comments
Labels
enhancement New feature or request exporter/kafka

Comments

@danielkatz
Copy link

danielkatz commented Mar 9, 2025

Component(s)

exporter/kafka

Is your feature request related to a problem? Please describe.

Currently, the Kafka exporter supports partitioning traces by trace id, which is very useful for context-aware data processing. However, for logs and metrics, we only have options like partition_metrics_by_resource_attributes and partition_logs_by_resource_attributes. These work well for load-balancing Kafka partitions but fall short when we need to partition data based on a single, specific attribute of a log/metric (such as trace_id or tenant_id). This limitation complicates tasks like merging related logs or metrics for enhanced processing.

Describe the solution you'd like

I propose introducing configuration options that allow specifying one attribute for partitioning logs and metrics. For instance, options like:

partition_logs_by_attribute: <name_of_the_attribute>
partition_metrics_by_attribute: <name_of_the_attribute>

This would extend the functionality provided for traces to logs and metrics, enabling context-aware data routing (e.g., merging by a specific attribute like trace_id, tenant_id, etc.) and improving the overall flexibility of the Kafka exporter.

Describe alternatives you've considered

There is non i could see, bar from exporting to kafka as-is, and then consuming and reemitting them into kafka with the proper partitioning by a custom service.

Additional context

No response

@danielkatz danielkatz added enhancement New feature or request needs triage New item requiring triage labels Mar 9, 2025
Copy link
Contributor

github-actions bot commented Mar 9, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@danielkatz danielkatz changed the title [exporter/kafkaexporter] [exporter/kafkaexporter] Enable Partitioning by Specific Resource Attribute for Logs & Metrics Mar 9, 2025
@danielkatz danielkatz changed the title [exporter/kafkaexporter] Enable Partitioning by Specific Resource Attribute for Logs & Metrics [exporter/kafkaexporter] Enable Partitioning by Specific Attribute for Logs & Metrics Mar 9, 2025
@Frapschen
Copy link
Contributor

More details need to be discussed regarding the scenario where a user sets both partition_logs_by_resource_attributes: true and partition_metrics_by_attribute: <name_of_the_attribute>. What behavior will the collector exhibit in this case?

For me, I prefer the partition_metrics_by_attribute configuration to take precedence.

@danielkatz
Copy link
Author

More details need to be discussed regarding the scenario where a user sets both partition_logs_by_resource_attributes: true and partition_metrics_by_attribute: <name_of_the_attribute>. What behavior will the collector exhibit in this case?

For me, I prefer the partition_metrics_by_attribute configuration to take precedence.

i agree

@namco1992
Copy link
Contributor

namco1992 commented Mar 27, 2025

I'd like to work on this issue if it's accepted by the owners/maintainers as this is a blocker for us too.

Also cc the new owner @axw

@axw
Copy link
Contributor

axw commented Mar 27, 2025

I'm definitely on board with partitioning by attributes. I'd ideally like to see a more complete design for the configuration, e.g.

  • Should this be only about span/data point/log record-level attributes? What about resource or scope attributes?
  • Should we include the ability to partition on client metadata?
  • Are we just talking about setting the message key, or also grouping data that share the same key?

In #38985 I'm planning to make topic & encoding configuration signal-specific (and I have a WIP branch for the exporter). I think it makes sense to do the same for partitioning, which would enable us to partition on properties that are signal-specific.

These work well for load-balancing Kafka partitions but fall short when we need to partition data based on a single, specific attribute of a log/metric (such as trace_id or tenant_id).

Generally speaking I would expect trace_id and tenant_id to sit at different scopes:

  • trace_id is always at the span and log record-level
  • tenant_id is application specific so it could come from anywhere

I haven't thought about all of this in great depth, but here's a rough proposal for discussion:

  1. Introduce logs::message_key, metrics::message_key, traces::message_key
  2. Deprecate partition_traces_by_id, partition_metrics_by_resource_attributes, and partition_logs_by_resource_attributes

I don't know what the new message_key config would look like exactly, but it should support at least:

  • hashing on client metadata (e.g. otlpreceiver request headers)
  • hashing the resource attributes, equivalent to partition_<signal>_by_resource_attributes (any signal)
  • arbitrary resource attributes (any signal)
  • trace_id (traces and logs only)
  • arbitrary span/datapoint/log record attributes (depending on signal)

We could potentially use OTTL for defining the message key, which would give the greatest flexibility. That way we may also get to use OTTL's context inference for deciding how to group data before encoding to a message. That may imply a single message for each span if you're using trace_id though, which would be a change from how things are today.

@pjanotti
Copy link
Contributor

@axw is this something that you plan to work on? Or should we add the help wanted label?

@axw
Copy link
Contributor

axw commented May 13, 2025

@pjanotti this can be solved by #39199 and #39208. Once I have a sponsor for #39199 I'll work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/kafka
Projects
None yet
Development

No branches or pull requests

5 participants