-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[prometheusremotewritereceiver] fix body decode to snappy format #38899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ArthurSens I think that this PR doesn't need to have a chlog. Right? |
It depends, I think we need to test this by building the collector from main, send a PRWv2 request and see if it works. If it's broken in main, then we need a changelog of type bug_fix |
You mean. just to create a test, send a snappy/compressed data, and validate if this new behavior is working properly? Can I do that creating a new test case on receiver_test.go? |
No, I mean actually run the collector and a Prometheus, send prwv2 requests from the Prometheus to otelcollector and see if things are working correctly. (if we get errors decoding the requests) We'll need to build and configure both tools properly, do you know how to do that? I know we can build the collector with |
Why the only way to validate if the data is being properly decompressed is creating the whole end2end test? I don't get it. |
I agree that this end2end is pretty important, my push is to simplify the implementation/test. Finalize the target_info and the other things, and in the end, implement this end2end that you are mentioning. |
Sorry if I was not clear; I didn't mean to implement a whole end-to-end test in code. I meant to test manually! Sometimes, it's easy and enough for the job at hand :) I just did it. I've built the collector while including the remote-write receiver (inspired by this PR #34747), and I've run it with the following configuration: receivers:
prometheusremotewrite:
endpoint: localhost:9091
exporters:
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [prometheusremotewrite]
exporters: [debug] Then I've built Prometheus from source and started it with the following configuration: global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"] # Self-scrape to collect metrics
remote_write:
- url: "http://localhost:9091/api/v1/write"
protobuf_message: io.prometheus.write.v2.Request # To enable PRWv2 Then, in Prometheus logs, I see the following:
This proves that the code in If we're fixing a behavior, then we definitely need a changelog entry of type Now, I'm doing the same thing as before but building the collector binary from your PR (hopefully with the fix 🤞 ) |
Thanks to share it. I will try to test it locally too, nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing the same test from above, but in your branch, I can still see the same error in Prometheus logs:
time=2025-03-24T21:32:44.539Z level=ERROR source=queue_manager.go:1670 msg="non-recoverable error" component=remote remote_name=2757a8 url=http://localhost:9091/api/v1/write failedSampleCount=563 failedHistogramCount=0 failedExemplarCount=0 err="server returned HTTP status 400 Bad Request: snappy: corrupt input\n"
I can also see this error in the collector logs:
2025-03-24T18:32:44.537-0300 warn [email protected]/receiver.go:155 Error decoding remote write request {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "snappy: corrupt input"}
This log line comes from here, which means we're failing even before we reach the new code you wrote 😬
@@ -156,8 +157,15 @@ func (prw *prometheusRemoteWriteReceiver) handlePRW(w http.ResponseWriter, req * | |||
return | |||
} | |||
|
|||
decompressedBody, err := snappy.Decode(nil, body) | |||
if err != nil { | |||
prw.settings.Logger.Warn("Error decoding remote write request to snappy", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prw.settings.Logger.Warn("Error decoding remote write request to snappy", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err}) | |
prw.settings.Logger.Warn("Error decoding snappy-encoded remote write request", zapcore.Field{Key: "error", Type: zapcore.ErrorType, Interface: err}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't even need this code actually, we just need to understand what is happening in io.ReadAll(req.Body)
🤔
Why is it corrupted over there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should we decode the snappy before the readall func? Don't know, just a guess
I believe this PR also resolves this issue. It seems like both are addressing the same underlying problem. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
…trib into decode-body-snappy
We already solved it here |
The snappy decoding has been fixed in open-telemetry/opentelemetry-collector#12911, is there anything left to do in this PR? |
Nope, I leave a message, but forgot to close it, thanks |
Description
This bug was found in the development of this feature. We are not decoding the body in the http handler, that is a MUST-HAVE requirement of this component.