Skip to content

New fluentbit metric exposure is not following prometheus specification leading to failed scrapes for other vendors #17976

@a-thaler

Description

@a-thaler

Description
With Kyma 2.17 the telemetry fluentbit setup switched to the new and recommended way of metrics exposure for fluentbit itself, see kyma-project/telemetry-manager#262. One of the advantages is to have storage metrics scrapable as well.
However, it turned out that the exposed format is not conform with the prometheus specification.

Before:

# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total {name="prometheus_exporter.0"} 0
fluentbit_output_retries_total{name="xxx-telemetry-http"} 7

After:

# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="prometheus_exporter.0"} 0
<Some other metrics>
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total {name="xxx-telemetry-http"} 7

The spec states: Only one HELP line may exist for any given metric name.

In consequence, some providers like Dynatrace are not able to scrape the data. Using the Dynatrace ActiveGate and the dynatrace specific labelling, only the first occurence of a metric gets collected.

However, Prometheus and VictoriaMetrics seem to handle that violation graceful. An upstream ticket exists already, we will see if we can contribute there.

For now we should revert the change, while keeping the very good test improvements of the PR.

Expected result

Actual result

Steps to reproduce

Troubleshooting

Metadata

Metadata

Assignees

Labels

area/loggingIssues or PRs related to the logging module (deprecated)kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions