Skip to content

Prometheus receiver not handling third-party Prometheus exporters sending untyped metrics #1103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Nicolas-MacBeth opened this issue Jun 10, 2020 · 9 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Nicolas-MacBeth
Copy link
Contributor

Nicolas-MacBeth commented Jun 10, 2020

Describe the bug
I am trying to send metrics from databases to the Collector using Prometheus' exporters for third-party services, and the Collector seems to not be able to handle it. My research has led to understand the problem lies in the Prometheus receiver, more specifically its handling of Prometheus untyped TYPE.

The biggest problem here is that because of that one "faulty" metric, all others (multiple hundred) are discarded and not exported. 2 out of 2 metrics exporters that I have tried send this type of metric, so I think it's safe to assume others will too. How would the community like to remedy this on the Collector side? At the very least, I think the behavior should be a warning + the rest of the metrics being handled/exported. Otherwise, how can we handle these untyped metrics? Here's the Prometheus reference.

In my case specifically, that metric could be substituted with a gauge - which fixed everything, but untyped is used a default according to the Prometheus reference. I'm unsure if it would be wise to "try to convert untyped to a gauge type by default", since an exporter passing incorrect values with a gauge type could break other things, but it could also be a great solution since it seems untyped is used when the actual value passed with the labels is unnecessary (for example: that "faulty" metric was the following:

# TYPE pg_static untyped
pg_static{server="localhost:5432",short_version="12.2.0",version="PostgreSQL 12.2 (Debian 12.2-1+build2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 9.2.1-16) 9.2.1 20191030, 64-bit"} 1

here, the value 1 is not needed). Also, even if something passed with untyped "breaks" the receiver, that behavior is not distant from the one we have now (nothing being exported).

I'd appreciate some input on how this should be fixed, and depending on the complexity/work involved, would love to try my hand at fixing it. Thanks!

Extra info:

I tried with the MySQL exporter and the PostgreSQL exporter. They both give this log in the Collector, every 10 seconds (scrape period):

2020-06-10T14:53:39.615Z        INFO    internal/metrics_adjuster.go:357        Adjust - skipping unexpected point      
{"component_kind": "receiver", "component_type": "prometheus", "component_name": "prometheus", "type": "UNSPECIFIED"}

The same metrics with the same exporters appear just fine in the actual Prometheus service, but are dropped in the Collector. Also, since they're dropped they're not exported to my backend of choice. Once again, the problem seems to be that the metricType (untyped) is not recognized by the Collector, and this line of code runs:

ma.logger.Info("Adjust - skipping unexpected point", zap.String("type", metricType.String()))
(notice the comment saying // this shouldn't run right above it).

More info: There are other untyped metrics exported by those two exporters, but they all have NaN as their values, so they're treated as dummy metrics (with 0 values if I understand correctly) by the Collector, so they don't cause the same problem - since they're dropped. However the second they have a value they would pose the same problem. Those metrics also appear properly when collected by the standalone Prometheus service, with NaN as values appearing in the dashboard. That's another instance of different behavior between the OTel Prometheus receiver and the standalone Prometheus service. This is less of a problem for now since at least the rest of the metrics are handled/exported.

Steps to reproduce
Download and run any of the two aforementioned databases, download and build its exporter and the Collector. Run all three at ounce.

What did you expect to see?
I expected to see my metrics exported and appear in my backend of choice (Stackdriver).

What did you see instead?
I saw this warning message logged by the Collector for latest commit (9af3381):

2020-06-10T14:53:39.615Z        INFO    internal/metrics_adjuster.go:357        Adjust - skipping unexpected point      
{"component_kind": "receiver", "component_type": "prometheus", "component_name": "prometheus", "type": "UNSPECIFIED"}

Log when running Latest release binary (0.3.0):

{"level":"info","ts":1591373779.4030387,"caller":"internal/metrics_adjuster.go:357",
"msg":"Adjust - skipping unexpected point","type":"UNSPECIFIED"}

Also, those metrics were not exported, only dropped.

What version did you use?
Open Telemetry Contrib:

  • Built binary from latest commit: 9af3381
  • Version 0.3.0 release binary
    Both do not work, same behavior

What config did you use?

extensions:
        health_check:
        zpages:
                endpoint: 0.0.0.0:55679
receivers:
        prometheus:
                config:
                        scrape_configs:
                                - job_name: 'postgresql_exporter_collector'
                                  scrape_interval: 10s
                                  static_configs:
                                          - targets: ['localhost:9187']
processors:
        batch:
        queued_retry:
exporters:
        stackdriver:
                project: "placeholder"
                skip_create_metric_descriptor: true
        file:
                path: ./metric.json
service:
        pipelines:
                metrics:
                        receivers: [prometheus]
                        exporters: [stackdriver]
        extensions: [health_check, zpages]

Environment
OS: Debian 9
Go version: 1.14.3 linux/amd64

@Nicolas-MacBeth Nicolas-MacBeth added the bug Something isn't working label Jun 10, 2020
@Nicolas-MacBeth
Copy link
Contributor Author

Joshua MacDonald in the Gitter seems to support defaulting to gauge when there is ambiguity - see https://gitter.im/open-telemetry/community.

I believe the conversion is done at this function: convToOCAMetricType link.

@draffensperger
Copy link
Contributor

@Nicolas-MacBeth do you think this would be a good first issue for you to tackle to do that conversion? @tigrannajaryan would you be open to that (I saw you assigned it to @asuresh4 )?

@asuresh4
Copy link
Member

@Nicolas-MacBeth - I'd happy to help with the review if you would like to work on this.

@Nicolas-MacBeth
Copy link
Contributor Author

Nicolas-MacBeth commented Jun 15, 2020

@asuresh4 I appreciate that, so to confirm; should I take the approach where all cases of ambiguity are handled as gauges (a default for all metric types whose type is unknown/not supported), or only for the untyped ones?

@asuresh4
Copy link
Member

Looking at the commit that introduced the new metric types, seems to me like metrics of untyped type map to unknown type. If that's indeed true, I believe we could convert metrics with unknown type to gauge. However, I'm not sure if the same holds for other types introduced by the commit, especially info and stateset types.

@bogdandrutu @tigrannajaryan - what do you guys think?

@tigrannajaryan tigrannajaryan self-assigned this Jun 17, 2020
@tigrannajaryan
Copy link
Member

This is outside of my area of expertise.
@bogdandrutu perhaps you can help.

@tigrannajaryan tigrannajaryan removed their assignment Jun 17, 2020
@flands flands added this to the GA 1.0 milestone Jun 18, 2020
@asuresh4
Copy link
Member

@Nicolas-MacBeth - I synced with @bogdandrutu on this offline and he's in agreement that we can convert metrics of textparse.MetricTypeUnknown type to gauge, and get to the other other unsupported types at a later point. Let me know if you still would like to work on the issue or I could pick it up.

@Nicolas-MacBeth
Copy link
Contributor Author

Okay, perfect! I'll make a PR soon; unless I'm missing anything it should be a simple fix. Thanks!

@Nicolas-MacBeth
Copy link
Contributor Author

Hi @asuresh4 , I made a PR if you want to check it out! #1194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants