Skip to content

Dependency: Update to Prometheus 0.300.* #36873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 13, 2025

Conversation

ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented Dec 17, 2024

Description

Supersedes #36642

This PR updates the prometheus/prometheus library in our go.mods to 0.300.* (which represents Prometheus 3.0).

It touches many go.mod files, but the Prometheus Receiver is the only component heavily affected by breaking changes.

// This includes ScrapeManager lib that is used by the Promethes receiver.
// We need to set the validation scheme to _something_ to avoid panics, and
// UTF8 is the default in Prometheus.
model.NameValidationScheme = model.UTF8Validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for after this merges (non-blocking): Is this actually necessary? Seems like it should always be set to this by default? https://github.com/prometheus/common/blob/cc17dab08e7c33f70e3b5ab865d6789f0ff95761/model/metric.go#L37

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think that change in prometheus/common was made after this PR was open.

I haven't tested this with the newest common version, definitely worth a try as a follow up to this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if someone is still running Prometheus 2.x but upgrades their OTEL Collector pipelines first? I'd imagine that could be a common scenario since people are probably not trying to do these upgrades in lock-step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question!

Even though the default validation scheme changes, we still use our own library to translate OTLP into Prometheus. This package still translates everything just like the old ways so we don't expect any changes here.

@@ -56,6 +63,7 @@ func createMetricsReceiver(
nextConsumer consumer.Metrics,
) (receiver.Metrics, error) {
configWarnings(set.Logger, cfg.(*Config))
addDefaultFallbackScrapeProtocol(cfg.(*Config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for after this merges (non-blocking): If this is setting the default scrape protocol, why do we still need to set it in our configuration everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By everywhere you mean all those test changes I've made?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, and also in the compliance repo

@dashpole
Copy link
Contributor

Some questions, but lets get this merged and fix them afterwards

@dashpole
Copy link
Contributor

Can you re-open the compliance PR?

@ArthurSens
Copy link
Member Author

Here you go prometheus/compliance#143

@dashpole dashpole added the ready to merge Code review completed; ready to merge by maintainers label Feb 13, 2025
@songy23 songy23 merged commit af6b17e into open-telemetry:main Feb 13, 2025
173 checks passed
@github-actions github-actions bot added this to the next release milestone Feb 13, 2025
@ArthurSens
Copy link
Member Author

That was some work, thank you everyone who helped here ❤️

@basti1302
Copy link

@ArthurSens I think this breaks existing configs.

Note: I'm building a custom collector via https://github.com/open-telemetry/opentelemetry-collector/tree/main/cmd/builder, so YMMV. But as far as I can tell, this will also affect users that do not build their own image.

After updating to github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.120.0 and starting the collector (on K8s, as a daemonset) with a previously working config, the collector pods go into a CrashLoopBackOff, with this error in the logs:

Error: invalid configuration: receivers::prometheus::config::scrapeconfigs::0::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
receivers::prometheus::config::scrapeconfigs::1::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
2025/02/18 16:02:18 collector server run finished with error: invalid configuration: receivers::prometheus::config::scrapeconfigs::0::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
receivers::prometheus::config::scrapeconfigs::1::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
stream closed EOF for dash0-system/dash0-operator-opentelemetry-collector-agent-daemonset-jwpwm (opentelemetry-collector)

Setting fallback_scrape_protocol: "PrometheusText1.0.0" on all scrape jobs solves the issue. However, it seems the intention of this PR was to update Prometheus as a non-breaking change? It appears addDefaultFallbackScrapeProtocol is not effective? My guess is that the validation kicks in before the factory has a chance to set the default value.

@ArthurSens
Copy link
Member Author

@ArthurSens I think this breaks existing configs.

Note: I'm building a custom collector via https://github.com/open-telemetry/opentelemetry-collector/tree/main/cmd/builder, so YMMV. But as far as I can tell, this will also affect users that do not build their own image.

After updating to github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.120.0 and starting the collector (on K8s, as a daemonset) with a previously working config, the collector pods go into a CrashLoopBackOff, with this error in the logs:

Error: invalid configuration: receivers::prometheus::config::scrapeconfigs::0::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
receivers::prometheus::config::scrapeconfigs::1::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
2025/02/18 16:02:18 collector server run finished with error: invalid configuration: receivers::prometheus::config::scrapeconfigs::0::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
receivers::prometheus::config::scrapeconfigs::1::scrapefallbackprotocol: unknown scrape protocol , supported: [OpenMetricsText0.0.1 OpenMetricsText1.0.0 PrometheusProto PrometheusText0.0.4 PrometheusText1.0.0]
stream closed EOF for dash0-system/dash0-operator-opentelemetry-collector-agent-daemonset-jwpwm (opentelemetry-collector)

Setting fallback_scrape_protocol: "PrometheusText1.0.0" on all scrape jobs solves the issue. However, it seems the intention of this PR was to update Prometheus as a non-breaking change? It appears addDefaultFallbackScrapeProtocol is not effective? My guess is that the validation kicks in before the factory has a chance to set the default value.

You're correct! We're trying to fix the problem here: #38018

basti1302 added a commit to dash0hq/dash0-operator that referenced this pull request Feb 18, 2025
basti1302 added a commit to dash0hq/dash0-operator that referenced this pull request Feb 18, 2025
@basti1302
Copy link

You're correct! We're trying to fix the problem here: #38018

Ah, there is already an issue for it. Sorry about the ping then. I searched the repo for existing issues, but I searched for scrapefallbackprotocol and unknown scrape protocol, that is, parts of the error message and didn't find that existing issue. Thanks for the link to the issue 👍

jpkrohling pushed a commit that referenced this pull request Feb 19, 2025
…dation (#38018)

#### Description
During the release process, we noticed a breaking change in the
Prometheus receiver caused by
#36873.

In that PR, I tried adding a fallback scrape protocol by default
everytime the PrometheusReceiver was built, but it turned out that
config validation happened even before the component was created. And
the collector fails startup with invalid configuration

This PR moves the addition of scrape protocol to the validation step.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes #37902

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
ArthurSens added a commit to ArthurSens/opentelemetry-collector-contrib that referenced this pull request Feb 20, 2025
…dation (open-telemetry#38018)

During the release process, we noticed a breaking change in the
Prometheus receiver caused by
open-telemetry#36873.

In that PR, I tried adding a fallback scrape protocol by default
everytime the PrometheusReceiver was built, but it turned out that
config validation happened even before the component was created. And
the collector fails startup with invalid configuration

This PR moves the addition of scrape protocol to the validation step.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
Fixes open-telemetry#37902

<!--Describe what testing was performed and which tests were added.-->

<!--Describe the documentation added.-->

<!--Please delete paragraphs that you did not use before submitting.-->
songy23 added a commit that referenced this pull request Feb 21, 2025
#### Description

#36873
the prometheus receiver can now keep dots in metric names rather than
converting them to underscores. E.g. say there is a metric `my.metric`
scraped from prometheus receiver, its name is `my_metric` before 0.120.0
vs. `my.metric` now. This should have broken some datadog integration
tests, but those are skipped in race detector (which is always on in
CIs) so the failures did not show up in CIs.
songy23 added a commit that referenced this pull request Feb 21, 2025
#### Description
Add a changelog amending #36873. There are indeed several breaking
changes associated with the 3.0 version update:
https://prometheus.io/docs/prometheus/latest/migration/

#### Link to tracking issue
related to
http://github.com/open-telemetry/opentelemetry-collector-contrib/issues/38097
cmacknz added a commit to cmacknz/opentelemetry-collector-contrib that referenced this pull request Mar 13, 2025
cmacknz added a commit to elastic/opentelemetry-collector-contrib that referenced this pull request Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants