[prometheusremotewritereceiver] fix body decode to snappy format #38899

perebaj · 2025-03-24T11:53:56Z

Description

This bug was found in the development of this feature. We are not decoding the body in the http handler, that is a MUST-HAVE requirement of this component.

perebaj · 2025-03-24T12:42:10Z

@ArthurSens I think that this PR doesn't need to have a chlog. Right?

ArthurSens · 2025-03-24T12:59:18Z

@ArthurSens I think that this PR doesn't need to have a chlog. Right?

It depends, I think we need to test this by building the collector from main, send a PRWv2 request and see if it works. If it's broken in main, then we need a changelog of type bug_fix

perebaj · 2025-03-24T14:41:57Z

@ArthurSens I think that this PR doesn't need to have a chlog. Right?

It depends, I think we need to test this by building the collector from main, send a PRWv2 request and see if it works. If it's broken in main, then we need a changelog of type bug_fix

I think we need to test this by building the collector from main

You mean. just to create a test, send a snappy/compressed data, and validate if this new behavior is working properly? Can I do that creating a new test case on receiver_test.go?

ArthurSens · 2025-03-24T14:45:40Z

No, I mean actually run the collector and a Prometheus, send prwv2 requests from the Prometheus to otelcollector and see if things are working correctly. (if we get errors decoding the requests)

We'll need to build and configure both tools properly, do you know how to do that? I know we can build the collector with make otelcontribcol, but I'm not sure if the prwreceiver will be available since it's not even alpha yet. We'll need to do some research on how to do that 😬

perebaj · 2025-03-24T14:51:20Z

No, I mean actually run the collector and a Prometheus, send prwv2 requests from the Prometheus to otelcollector and see if things are working correctly. (if we get errors decoding the requests)

We'll need to build and configure both tools properly, do you know how to do that? I know we can build the collector with make otelcontribcol, but I'm not sure if the prwreceiver will be available since it's not even alpha yet. We'll need to do some research on how to do that 😬

Why the only way to validate if the data is being properly decompressed is creating the whole end2end test? I don't get it.

perebaj · 2025-03-24T14:53:48Z

I agree that this end2end is pretty important, my push is to simplify the implementation/test. Finalize the target_info and the other things, and in the end, implement this end2end that you are mentioning.

ArthurSens · 2025-03-24T21:25:13Z

I agree that this end2end is pretty important, my push is to simplify the implementation/test. Finalize the target_info and the other things, and in the end, implement this end2end that you are mentioning.

Sorry if I was not clear; I didn't mean to implement a whole end-to-end test in code. I meant to test manually! Sometimes, it's easy and enough for the job at hand :)

I just did it. I've built the collector while including the remote-write receiver (inspired by this PR #34747), and I've run it with the following configuration:

receivers:
  prometheusremotewrite:
    endpoint: localhost:9091

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [prometheusremotewrite]
      exporters: [debug]

Then I've built Prometheus from source and started it with the following configuration:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"] # Self-scrape to collect metrics

remote_write: 
  - url: "http://localhost:9091/api/v1/write"
    protobuf_message: io.prometheus.write.v2.Request # To enable PRWv2

Then, in Prometheus logs, I see the following:

time=2025-03-24T21:15:56.004Z level=ERROR source=queue_manager.go:1670 msg="non-recoverable error" component=remote remote_name=2757a8 url=http://localhost:9091/api/v1/write failedSampleCount=563 failedHistogramCount=0 failedExemplarCount=0 err="server returned HTTP status 400 Bad Request: snappy: corrupt input\n"

This proves that the code in main really is broken! Nice catch :)

If we're fixing a behavior, then we definitely need a changelog entry of type bug_fix

Now, I'm doing the same thing as before but building the collector binary from your PR (hopefully with the fix 🤞 )

perebaj · 2025-03-24T21:33:18Z

I agree that this end2end is pretty important, my push is to simplify the implementation/test. Finalize the target_info and the other things, and in the end, implement this end2end that you are mentioning.

Sorry if I was not clear; I didn't mean to implement a whole end-to-end test in code. I meant to test manually! Sometimes, it's easy and enough for the job at hand :)

Now, I'm doing the same thing as before, but build the collector binary from your PR (hopefully with the fix 🤞 )

Thanks to share it. I will try to test it locally too, nice!

ArthurSens

After doing the same test from above, but in your branch, I can still see the same error in Prometheus logs:


time=2025-03-24T21:32:44.539Z level=ERROR source=queue_manager.go:1670 msg="non-recoverable error" component=remote remote_name=2757a8 url=http://localhost:9091/api/v1/write failedSampleCount=563 failedHistogramCount=0 failedExemplarCount=0 err="server returned HTTP status 400 Bad Request: snappy: corrupt input\n"

I can also see this error in the collector logs:

2025-03-24T18:32:44.537-0300    warn    [email protected]/receiver.go:155  Error decoding remote write request     {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "snappy: corrupt input"}

This log line comes from here, which means we're failing even before we reach the new code you wrote 😬

ArthurSens · 2025-03-24T21:28:28Z

receiver/prometheusremotewritereceiver/receiver.go

@@ -156,8 +157,15 @@ func (prw *prometheusRemoteWriteReceiver) handlePRW(w http.ResponseWriter, req *
 		return
 	}

+	decompressedBody, err := snappy.Decode(nil, body)
+	if err != nil {
+		prw.settings.Logger.Warn("Error decoding remote write request to snappy", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err})


Suggested change

prw.settings.Logger.Warn("Error decoding remote write request to snappy", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err})

prw.settings.Logger.Warn("Error decoding snappy-encoded remote write request", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err})

Maybe we don't even need this code actually, we just need to understand what is happening in io.ReadAll(req.Body) 🤔

Why is it corrupted over there?

Maybe should we decode the snappy before the readall func? Don't know, just a guess

sujalshah-bit · 2025-03-26T14:16:37Z

I believe this PR also resolves this issue. It seems like both are addressing the same underlying problem.

github-actions · 2025-04-17T05:21:05Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

…trib into decode-body-snappy

perebaj · 2025-04-29T12:08:45Z

We already solved it here

ArthurSens · 2025-05-07T18:22:47Z

The snappy decoding has been fixed in open-telemetry/opentelemetry-collector#12911, is there anything left to do in this PR?

perebaj · 2025-05-07T19:35:32Z

The snappy decoding has been fixed in open-telemetry/opentelemetry-collector#12911, is there anything left to do in this PR?

Nope, I leave a message, but forgot to close it, thanks

[prometheuswritereceiver] fix body decode to snappy format

fe24554

github-actions bot added the receiver/prometheusremotewrite label Mar 24, 2025

github-actions bot requested review from ArthurSens and dashpole March 24, 2025 11:54

perebaj mentioned this pull request Mar 24, 2025

[prometheuswritereceiver] target info #38812

Merged

ArthurSens reviewed Mar 24, 2025

View reviewed changes

perebaj added 3 commits March 31, 2025 00:00

add confighttp withdecoder snappy option

26ef8f9

add new test to validate snappy decompression

d6a13da

format

2a2c38d

perebaj changed the title ~~[prometheuswritereceiver] fix body decode to snappy format~~ [prometheusremotewritereceiver] fix body decode to snappy format Apr 2, 2025

perebaj added 2 commits April 2, 2025 09:58

chlog

f314f87

remove withDecoder snappy

b4e6265

ArthurSens mentioned this pull request Apr 11, 2025

Proposed "snappy" vs "x-snappy-framed" content-type confusion open-telemetry/opentelemetry-collector#12825

Closed

github-actions bot added the Stale label Apr 17, 2025

perebaj added 2 commits April 20, 2025 18:41

Merge branch 'main' of github.com:perebaj/opentelemetry-collector-con…

c28ff93

…trib into decode-body-snappy

validate snappy compression

cd9164b

github-actions bot removed the Stale label Apr 22, 2025

perebaj closed this May 7, 2025

	prw.settings.Logger.Warn("Error decoding remote write request to snappy", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err})
	prw.settings.Logger.Warn("Error decoding snappy-encoded remote write request", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err})

[prometheusremotewritereceiver] fix body decode to snappy format #38899

[prometheusremotewritereceiver] fix body decode to snappy format #38899

Uh oh!

Conversation

perebaj commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

perebaj commented Mar 24, 2025

Uh oh!

ArthurSens commented Mar 24, 2025

Uh oh!

perebaj commented Mar 24, 2025

Uh oh!

ArthurSens commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

perebaj commented Mar 24, 2025

Uh oh!

perebaj commented Mar 24, 2025

Uh oh!

ArthurSens commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

perebaj commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurSens Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurSens Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

perebaj Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

sujalshah-bit commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

perebaj commented Apr 29, 2025

Uh oh!

ArthurSens commented May 7, 2025

Uh oh!

perebaj commented May 7, 2025

Uh oh!

Uh oh!

perebaj commented Mar 24, 2025 •

edited

Loading

ArthurSens commented Mar 24, 2025 •

edited

Loading

ArthurSens commented Mar 24, 2025 •

edited

Loading

perebaj commented Mar 24, 2025 •

edited

Loading

sujalshah-bit commented Mar 26, 2025 •

edited

Loading