-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[chore] Revert dc8e2dd #12917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore] Revert dc8e2dd #12917
Conversation
…ate (open-telemetry#12856)" This reverts commit dc8e2dd.
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (77.27%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #12917 +/- ##
==========================================
- Coverage 91.65% 91.60% -0.05%
==========================================
Files 499 499
Lines 27426 27437 +11
==========================================
- Hits 25138 25135 -3
- Misses 1809 1818 +9
- Partials 479 484 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Can we be more specific about what is broken? The intended effect of open-telemetry/opentelemetry-collector-contrib#12856 was to add scope attributes to all telemetry describing components. We discussed lack of support for scope attributes as a potential concern but didn't notice any immediate issues. However, it seems that we may have overlooked one. Is it possible this is just a single consumer of the telemetry which needs to correctly handle scope attributes? |
The issueI haven't investigated how the Prometheus exporter works in detail, but here is my guess on what the issue is: It looks like until now, there was no way to identify which batch processor instance a metric like
The metrics from both open-telemetry/opentelemetry-collector-contrib#12856 automatically injects additional attributes which allow differentiating the metric streams from both instances. However, these are instrumentation scope attributes, which are not currently supported by the Prometheus exporter. I had assumed this simply meant that the distinction would be aggregated away at the exporter level, but it sounds like the Prometheus exporter does not implement any aggregation logic, and is very strict on its notion of "different metrics" matching that of the input. Because we no longer aggregate these 2 metric streams at the OTel level because of the differing scope attributes, and the Prometheus exporter completely ignores scope attributes and considers them to alias, it errors out. In my opinion, this is clearly a bug in the Prometheus exporter. Having two metric streams on the OTel side that the exporter cannot tell apart because of lack of support of some identifying properties should not result in an unrecoverable error. And this could happen again if internal telemetry starts using other identifying properties the exporter doesn't support. What to doBecause I expect many users to be relying on the Prometheus exporter, this needs to be fixed before the next release. The obvious long-term course of action is to modify the Prometheus exporter to support instrumentation scope attributes, and perhaps to make it more lenient towards unexpectedly finely aggregated input. If we estimate that we can't get that done before the next release:
|
Related issue in contrib: #12923 |
If this is something that can be solved in the prometheus exporter, I'd much prefer we resolve it there. I suggest we give the code owners a day to weigh in before attempting other solutions. |
Disclaimer I don't have much context on the related changes, I have this PR just because I was pinged in #12812 and followed the conversation there :) Personally I would prefer to do a clean revert first - that should bring back the behavior in v0.124.0 IIUC - and then make any compensated changes in a follow-up PR. That is easier to review & easier to isolate (e.g. what if the follow-up PR introduces new issues). But ultimately will defer to @djaglowski & @jade-guiton-dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the PR and marking it as a release blocker to prevent this from going out next week @songy23, I agree with @djaglowski let's see if we can address the underlying issue first, marking it as requesting change to prevent it from accidentally getting merged 👍🏻
FYI a helm chart user ran into this: open-telemetry/opentelemetry-helm-charts#1642 (comment). I set |
Normally I would agree but we already have a stack of problems that reverting won't cleanly resolve so I think pressing ahead may be safer. |
I'm not sure if the exporter ignores the scope attributes, based on the output I saw when I ran the collector, the scope namd and version are set as labels, but they are not different for the different pipelines causing the issue
Would adding the attribute for individual pipelines address the underlying problem that the prometheus client library doesn't know what to do w/ metrics it considers duplicates (https://github.com/prometheus/client_golang/blob/main/prometheus/registry.go#L943) |
@codeboten It handles the name and version of an instrumentation scope, but not its attribute set; those are different fields. |
@jade-guiton-dd ah thanks, i misunderstood your comment |
Related issue: open-telemetry/opentelemetry-go#5846 |
I opened an issue to track ensuring that this problem is caught with a core tests in #12918 |
If we want feedback from Prometheus exporter codeowners, should we file an issue about this on opentelemetry-go? |
I see three solutions, short/medium/long term.
|
Just wanted to clarify this issue is using the OTLP endpoint in Prometheus 3 its not using the Prometheus Exporter. So I don't think any changes to Prometheus Exporter will fix it |
Hi folks, I accidentally landed here while triaging issues in contrib. Collector internal metrics are exposed through the GO-SDK Prometheus exporter, and not the collector's component Prometheus exporter. So the issue in contrib is not related to the problem here. I'm not sure how I can be helpful since I'm still getting used to OTel's spec, but please don't hesitate to ping me if you feel like I could be useful |
superseded by #12933 |
Description
reverts #12856, breaks internal metrics
Testing
tested locally at localhost:8888/metrics, before the revert:

after:
