Skip to content

[reciever/prometheusremotewritereceiver] Handle multiple timeseries with same metric name+type+unit #38453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

perebaj
Copy link
Contributor

@perebaj perebaj commented Mar 7, 2025

Description

This PR belongs to part of the Linux foundation mentee program. Here we are dealing with data points slices that must be added to the same metric when the incoming time series have some attributes in common. Like same resource, metricName, scopeName, scopeVersion, unitRef and timeseries type. The reference to this implementation can be found https://opentelemetry.io/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model

Besides that. We are creating 2 new test cases. The first one, to validate the described behavior above.

The second is to validate an error case. When the unitRef passed doesn't match with the symbols slice, causing a panic. This test case is important to guarantee that the error is being handled correctly.

Link to tracking issue

Fixes part of #37277. Bullet point Handle multiple timeseries with same metric name+type+unit

@github-actions github-actions bot requested review from ArthurSens and dashpole March 7, 2025 13:00
@perebaj perebaj changed the title [reciever/prometheusremotewritereciever] separate timeseries, same la… [reciever/prometheusremotewritereciever] separate timeseries Mar 7, 2025
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since David and I decided we needed a cache for what you're doing now. So, I'm making an effort to remember why we wanted this cache in the first place. I'm separating the content here into three parts!

Why caching?

Starting with the fundamentals, why people build caches: A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.

In the terms of this code you're working on:

  • The data we want to retrieve: A possible already-existent Metric with the same metricName+unit+description as the one we're currently processing.
  • The slower storage layer: A loop over all ResourceMetrics, ScopeMetrics and Metrics until we find the one that fits.

You are correct that a cache for us here will be a hashmap (I'm looking at the intraRequestCache argument you created). Accessing a hashmap element is O(1) while looping over an array is O(n). We sacrifice a bit of memory for a faster translation! But the cache doesn't make sense if, after looking it up, we still need to loop over something else to see if it matches.


OTel Data Structure

Now, ignoring the cache problem, let's look at the OTel Metric Data Model.

Several parts identify a particular Metric. The first ones on the list are the Resource Attributes and Instrumentation Scope! In code, we can see that because we cannot access a Metric without accessing the mentioned information first.

rm := otelMetrics.ResourceMetrics().AppendEmpty()
// Add resource attributes
sm := rm.ScopeMetrics().AppendEmpty() 
sm.Scope.SetName(...)
sm.Scope.SetVersion(...)
metric := scope.Metrics().AppendEmpty() 
// Notice how we can never create metrics without first creating resource/scope!!

// If we wanted our metric to have a different Instrumentation Scope information, what we would need to do is:
sm2 := rm.ScopeMetrics.AppendEmpty()
sm2.Scope.SetName(...)
sm2.Scope.SetVersion(...)
// We can't re-use the previous metric here because there isn't anything like
// sm.Metrics().At(x).UpdateMetric(sm2.Metrics().AppendEmpty())
// We're forced to create a new metric for the new Scope; therefore, we have a brand new metric just because we changed the Scope
metric2 := sm2.Metrics().AppendEmpty()

// The same pattern repeats if we want a new Attribute in our Resource.

The second part is more obvious: metric name, unit, and type are identifying. Looking at the code:

// code that creates Resource and ScopeMetrics are hidden
metric1 := sm.Metrics().AppendEmpty()
metric1.SetName("new-metric")
metric1.SetUnit("s")

// Type is defined while creating data points. E.g., if it's a gauge metric, we set the metric type by writing:
gauge := metric1.SetEmptyGauge()

// What we want to avoid is calling SetEmptyGauge() again if we notice that a new timeseries came in with the same
// identifying information above: metric name, unit, and type.

Getting back to our problem: Caching OTel Metrics

We want to avoid looping over ResourceMetrics, ScopeMetrics, and Metrics to find something that matches name + type + unit. We want direct access to a Metric object with the information we seek.

I wanted you to exercise and propose something based on the information I just gave you. I'm going to give you two hints, though! The comment position with the TODO probably sent you in the wrong direction; I'm not sure if that's the best place to build and handle the caching logic. Passing the existing ResourceMetrics cache we have today doesn't seem correct, we want a new cache!

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: timeseries that has the same labels and name/unitRef metric Metadata should belongs to the same datapoints slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe too much details that doesn't add much value?

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, we should be able to merge it soon!

@perebaj perebaj changed the title [reciever/prometheusremotewritereciever] separate timeseries [reciever/prometheusremotewritereciever] Handle multiple timeseries with same metric name+type+unit Mar 18, 2025
@perebaj perebaj marked this pull request as ready for review March 18, 2025 16:51
@perebaj perebaj requested a review from a team as a code owner March 18, 2025 16:51
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ArthurSens ArthurSens added the ready to merge Code review completed; ready to merge by maintainers label Mar 18, 2025
@atoulme atoulme changed the title [reciever/prometheusremotewritereciever] Handle multiple timeseries with same metric name+type+unit [reciever/prometheusremotewritereceiver] Handle multiple timeseries with same metric name+type+unit Mar 18, 2025
@atoulme atoulme merged commit 7893679 into open-telemetry:main Mar 18, 2025
180 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 18, 2025
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this pull request Apr 24, 2025
…ith same metric name+type+unit (open-telemetry#38453)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR belongs to part of the Linux foundation mentee program. Here we
are dealing with data points slices that must be added to the same
metric when the incoming time series have some attributes in common.
Like same resource, metricName, scopeName, scopeVersion, unitRef and
timeseries type. The reference to this implementation can be found
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model

Besides that. We are creating 2 new test cases. The first one, to
validate the described behavior above.

The second is to validate an error case. When the unitRef passed doesn't
match with the symbols slice, causing a panic. This test case is
important to guarantee that the error is being handled correctly.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes part of open-telemetry#37277. Bullet point `Handle multiple timeseries with same
metric name+type+unit`

---------

Co-authored-by: Arthur Silva Sens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Code review completed; ready to merge by maintainers receiver/prometheusremotewrite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants