-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[reciever/prometheusremotewritereceiver] Handle multiple timeseries with same metric name+type+unit #38453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reciever/prometheusremotewritereceiver] Handle multiple timeseries with same metric name+type+unit #38453
Conversation
…bels, same datadatpoint slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since David and I decided we needed a cache for what you're doing now. So, I'm making an effort to remember why we wanted this cache in the first place. I'm separating the content here into three parts!
Why caching?
Starting with the fundamentals, why people build caches: A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.
In the terms of this code you're working on:
- The data we want to retrieve: A possible already-existent Metric with the same metricName+unit+description as the one we're currently processing.
- The slower storage layer: A loop over all ResourceMetrics, ScopeMetrics and Metrics until we find the one that fits.
You are correct that a cache for us here will be a hashmap (I'm looking at the intraRequestCache
argument you created). Accessing a hashmap element is O(1) while looping over an array is O(n). We sacrifice a bit of memory for a faster translation! But the cache doesn't make sense if, after looking it up, we still need to loop over something else to see if it matches.
OTel Data Structure
Now, ignoring the cache problem, let's look at the OTel Metric Data Model.
Several parts identify a particular Metric. The first ones on the list are the Resource Attributes and Instrumentation Scope! In code, we can see that because we cannot access a Metric without accessing the mentioned information first.
rm := otelMetrics.ResourceMetrics().AppendEmpty()
// Add resource attributes
sm := rm.ScopeMetrics().AppendEmpty()
sm.Scope.SetName(...)
sm.Scope.SetVersion(...)
metric := scope.Metrics().AppendEmpty()
// Notice how we can never create metrics without first creating resource/scope!!
// If we wanted our metric to have a different Instrumentation Scope information, what we would need to do is:
sm2 := rm.ScopeMetrics.AppendEmpty()
sm2.Scope.SetName(...)
sm2.Scope.SetVersion(...)
// We can't re-use the previous metric here because there isn't anything like
// sm.Metrics().At(x).UpdateMetric(sm2.Metrics().AppendEmpty())
// We're forced to create a new metric for the new Scope; therefore, we have a brand new metric just because we changed the Scope
metric2 := sm2.Metrics().AppendEmpty()
// The same pattern repeats if we want a new Attribute in our Resource.
The second part is more obvious: metric name, unit, and type are identifying. Looking at the code:
// code that creates Resource and ScopeMetrics are hidden
metric1 := sm.Metrics().AppendEmpty()
metric1.SetName("new-metric")
metric1.SetUnit("s")
// Type is defined while creating data points. E.g., if it's a gauge metric, we set the metric type by writing:
gauge := metric1.SetEmptyGauge()
// What we want to avoid is calling SetEmptyGauge() again if we notice that a new timeseries came in with the same
// identifying information above: metric name, unit, and type.
Getting back to our problem: Caching OTel Metrics
We want to avoid looping over ResourceMetrics, ScopeMetrics, and Metrics to find something that matches name + type + unit. We want direct access to a Metric object with the information we seek.
I wanted you to exercise and propose something based on the information I just gave you. I'm going to give you two hints, though! The comment position with the TODO probably sent you in the wrong direction; I'm not sure if that's the best place to build and handle the caching logic. Passing the existing ResourceMetrics cache we have today doesn't seem correct, we want a new cache!
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: timeseries that has the same labels and name/unitRef metric Metadata should belongs to the same datapoints slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe too much details that doesn't add much value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, we should be able to merge it soon!
Co-authored-by: Arthur Silva Sens <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…ith same metric name+type+unit (open-telemetry#38453) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR belongs to part of the Linux foundation mentee program. Here we are dealing with data points slices that must be added to the same metric when the incoming time series have some attributes in common. Like same resource, metricName, scopeName, scopeVersion, unitRef and timeseries type. The reference to this implementation can be found https://opentelemetry.io/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model Besides that. We are creating 2 new test cases. The first one, to validate the described behavior above. The second is to validate an error case. When the unitRef passed doesn't match with the symbols slice, causing a panic. This test case is important to guarantee that the error is being handled correctly. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes part of open-telemetry#37277. Bullet point `Handle multiple timeseries with same metric name+type+unit` --------- Co-authored-by: Arthur Silva Sens <[email protected]>
Description
This PR belongs to part of the Linux foundation mentee program. Here we are dealing with data points slices that must be added to the same metric when the incoming time series have some attributes in common. Like same resource, metricName, scopeName, scopeVersion, unitRef and timeseries type. The reference to this implementation can be found https://opentelemetry.io/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model
Besides that. We are creating 2 new test cases. The first one, to validate the described behavior above.
The second is to validate an error case. When the unitRef passed doesn't match with the symbols slice, causing a panic. This test case is important to guarantee that the error is being handled correctly.
Link to tracking issue
Fixes part of #37277. Bullet point
Handle multiple timeseries with same metric name+type+unit