Skip to content

Changed behaviour of resource generation in Prometheus receiver since upgrade to Prometheus 3.X #38097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bacherfl opened this issue Feb 21, 2025 · 12 comments
Labels
question Further information is requested receiver/prometheus Prometheus receiver Stale

Comments

@bacherfl
Copy link
Contributor

Component(s)

receiver/prometheus

Describe the issue you're reporting

This affects version 0.120.1

Since the recent upgrade to Prometheus 3.x in our dependencies, I have noticed a change in the OTel resources created by the Prometheus receiver: Since now the default validation scheme used by the receiver is set to UTF8, as described in the Prometheus 3.x migration guide, labels like service_name, will now be received as service.name. This can potentially interfere with service.name of the resulting OTel resource that has previously been derived by the job label of either the metric itself or the scrape config.

One particular example where this is a breaking change is when we use the following config to export the self monitoring metrics of the collector:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: opentelemetry-collector
          scrape_interval: 60s
          static_configs:
            - targets:
                - 127.0.0.1:8888

exporters:
  debug:
    verbosity: detailed

service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [debug]

The scrape config used here scrapes the collector's self monitoring prometheus endpoint, which, before the upgrade to Prometheus 3.x, contained e.g. the following metric:

# HELP otelcol_process_memory_rss Total physical memory (resident set size) [alpha]
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{"service_instance_id"="746750de-eddd-41a1-96d1-bd728ee3be73","service_name"="my-collector","service_version"="0.119.0"} 8.0920576e+07

As a result, the resource created from this metric then had the following attributes:

Resource attributes:
     -> service.name: Str(opentelemetry-collector) # taken from the name of the scrape job
     -> service.instance.id: Str(127.0.0.1:8888) # taken from the target URL of the scrape job
     -> service_name: Str(my-collector) # taken from the metric labels
     -> service_instance_id: Str(746750de-eddd-41a1-96d1-bd728ee3be73)  # taken from the metric labels
     -> net.host.port: Str(8888)
     -> http.scheme: Str(http)
     -> server.port: Str(8888)
     -> url.scheme: Str(http)

Now, since release 0.120.0, the prometheus receiver will set the Accept header to Accept:text/plain;version=1.0.0;escaping=allow-utf-8, when accessing the metrics endpoint, which cases the response of the scrape request to be delivered as follows:

# HELP otelcol_process_memory_rss Total physical memory (resident set size) [alpha]
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{"service.instance.id"="746750de-eddd-41a1-96d1-bd728ee3be73","service.name"="my-collector","service_version"="0.119.0"} 8.0920576e+07

This will result in the following resource:

Resource attributes:
     -> service.name: Str(my-collector) # taken from the metric labels
     -> service.instance.id: Str(746750de-eddd-41a1-96d1-bd728ee3be73)  # taken from the metric labels
     -> net.host.port: Str(8888)
     -> http.scheme: Str(http)
     -> server.port: Str(8888)
     -> url.scheme: Str(http)

I'm not sure if this should be considered as a bug, as it seems logical to use the service.name label instead of the job label to create the resource, but since this is a notable change in the behaviour, I would like to raise awareness to hopefully prevent some confusion as to why generated resources might now be named differently.

@bacherfl bacherfl added needs triage New item requiring triage question Further information is requested labels Feb 21, 2025
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Feb 21, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ArthurSens
Copy link
Member

cc @dashpole @ywwg

We probably want to discuss this. I'm inclined to agree @bacherfl that service.name label is a better fit for the service.name resource attribute since this is literally where it came from 😬

That is a big change though, I apologize for not noticing it. Changing the behavior here would also require a change in the spec, doesn't it?

@dashpole
Copy link
Contributor

Hmmm... Should we only use the service.name attribute from the scraped target if honor_labels is true?

@dashpole
Copy link
Contributor

I've also heard some users complain about not having a job attribute. We could also decide to leave service.name as-is, and not translate job to it. But that would be a larger breaking change, and might break other assumptions we've made

@ArthurSens
Copy link
Member

Yes, we need to evaluate the impact that would have on the info() PromQL function. I believe we're also breaking it if it requires labels service_name, service_instance_id

@ywwg
Copy link

ywwg commented Feb 21, 2025

Part of this behavior is this bug: #37937

And as part of fixing that bug, otel collector will not send dotted label names in the near term. However, we will still need to account for the future situation where UTF-8 is functioning and these dotted names are sent. Indeed, it is my belief that the info function will need to be updated to support the original label names

@ywwg
Copy link

ywwg commented Feb 21, 2025

This PR would fix the problem I believe: #37938

@ArthurSens
Copy link
Member

I think the proposed solution won't work because then Prometheus Receiver won't be able to scrape endpoints that expose UTF-8. The global variable makes it difficult for the collector architecture 😬

@ywwg
Copy link

ywwg commented Feb 24, 2025

Ah, this is a bit of a misconception that I have not done a good job of explaining. The global variable is best thought of a way to gate whether the code is UTF-8 aware, and by default, yes, validity checks will look for UTF-8 validity. However there are new APIs, .IsValidLegacy() and IsValidLegacyMetricName() that new code can call if they want to check to see if something is valid under the old world.

So the idea is, a client can extend its code to look for either UTF8 or legacy validity, and then flip the SDK global flag when it's ready. This is how Prometheus did it: even when the SDK global is set to UTF-8, individual data sources can be set to legacy mode.

@ywwg
Copy link

ywwg commented Feb 24, 2025

I really need to write that blog post :)

damemi added a commit to odigos-io/odigos that referenced this issue Feb 28, 2025
mx-psi pushed a commit that referenced this issue Mar 3, 2025
…#38290)

#### Description
Add more specifics on how collector internal metrics got affected

#### Link to tracking issue

#38097
@ArthurSens
Copy link
Member

There's work going on in Prometheus that is relevant here: prometheus/prometheus#16066

This PR will update the scrape manager to allow escaping options during scrapes. We can rollback to the old behavior with this, but now that the damage is done... should we do it?

@ArthurSens ArthurSens removed the needs triage New item requiring triage label Mar 26, 2025
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested receiver/prometheus Prometheus receiver Stale
Projects
None yet
Development

No branches or pull requests

5 participants