Skip to content

googlecloudpubsubreceiver: growing unacked message in subscription #38164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RuslanAkchurinMutinex opened this issue Feb 25, 2025 · 9 comments
Labels
bug Something isn't working receiver/googlecloudpubsub

Comments

@RuslanAkchurinMutinex
Copy link

RuslanAkchurinMutinex commented Feb 25, 2025

Component(s)

receiver/googlecloudpubsub

What happened?

Description

I am using Open Telemetry Collectors deployed in GKE. This is a dedicated deployment that only works with one subscription. There is a organization sink sending all relevant logs to that subscription. Throughput is conservative, after applying filter in sink it's only around ~200k messages per day. Collectors deployment consists of 5 pods with 500mi/1Gi limits (not even close to hit them anyway). Shortly after deployment I purge messages and almost immediately observe slow but steady growth of unacked messages. There are no errors in collector output (except channel re-establishing) There are no failed or refused records in collector self telemetry. There are no messages in dlq. Subscription is set with 600 ack deadline, exactly once delivery enabled.

Steps to Reproduce

Setup organization sink, setup subscription, start collecting logs from it.

Expected Result

All sent messages are acknowledged

Actual Result

Some messages are not acknowledged and their number growths with time.

Collector version

v0.120.1

Environment information

Environment

OS: cos

OpenTelemetry Collector configuration

receivers:
  googlecloudpubsub/logs/minimal:
    project: ${PROJECT_ID}
    subscription: ${SUBSCRIPTION_ID}
    encoding: cloud_logging

processors:
  batch:
    send_batch_size: 4096

exporters:
  otlp/googlecloud:
    endpoint: "api.honeycomb.io:443"
    headers:
      "x-honeycomb-team": ${HONEYCOMB_API_KEY}
      "x-honeycomb-dataset": ${HONEYCOMB_DATASET}

service:
  pipelines:
    logs/minimal:
      receivers: [googlecloudpubsub/logs/minimal]
      processors: [batch]
      exporters: [otlp/googlecloud]

Log output

Additional context

Image
@RuslanAkchurinMutinex RuslanAkchurinMutinex added bug Something isn't working needs triage New item requiring triage labels Feb 25, 2025
@atoulme atoulme added receiver/googlecloudpubsub and removed needs triage New item requiring triage labels Feb 25, 2025
@atoulme
Copy link
Contributor

atoulme commented Feb 25, 2025

Thank you for filing this issue. I am pinging the codeowners of the googlecloudpubsubreceiver.

Copy link
Contributor

Pinging code owners for receiver/googlecloudpubsub: @alexvanboxel. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@alexvanboxel
Copy link
Contributor

LogEntry will be soon deprecated, as soon as this encoder lands: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/encoding/googlecloudlogentryencodingextension

I'm going to keep this open with priority:p3 as it could be that even with the marshaller support, we could need to have an option what todo when marshalling fails.

@RuslanAkchurinMutinex
Copy link
Author

RuslanAkchurinMutinex commented Feb 27, 2025

Hi @alexvanboxel, thank you for your comment. But I am not completely clear what does it mean exactly. Are these unacknowledged messages actually lost and not delivered to the backend? If yes, will adding this encoder help to process them? If no, what would be the best option for me to setup a reliable log delivery from gcp?

@alexvanboxel
Copy link
Contributor

The problem is the encoding, probably the Protobuf encoding fails so the message is un-acked. The new encoder (will by default just do JSON, so you can switch to a more stable parsing. (personally I don't like the Protobuf parsing, it was my mistake to have allowed it in).

As a workaround you could try raw_text, if you still see un-acked messages then something else is wrong.

@RuslanAkchurinMutinex
Copy link
Author

@alexvanboxel it appears that receiver is struggling parsing cloudaudit.googleapis.com logs (which indeed have protoPayload type). After filtering out this source in sink I don't see backlog growing anymore. I am wondering if the new encoder is planned to have capability to parse it (or at least to pass it unparsed to the next processor in chain to deal with it). It would be great to have some output in warn if receiver fails to parse a record too.

Greatly appreciate your help and your work on this receiver, thank you, Alex.

@alexvanboxel
Copy link
Contributor

The encodering I pointed in the previous comment will default to JSON by default, less risky, you will have to force the Proto if you want that. This will probably land in a few releases, I'm not happy with the tests yet. The code is ready.

Good points on what todo when parsing fails, I'll think about it. So I like to keep this ticket open for those issues.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@alexvanboxel
Copy link
Contributor

The PR will enable to ignore encoding errors (with the drawback of loosing message). A metric is added to be able to monitor the issue: #39839

@github-actions github-actions bot removed the Stale label May 5, 2025
atoulme pushed a commit that referenced this issue May 5, 2025
…39839)

#### Description
Introduce a setting to ignore errors when the configured encoder. It's
advised to set this to `true` when using
a custom encoder, and use `receiver.googlecloudpubsub.encoding_error`
metric to monitor the number of errors.
Ignoring the error will cause the receiver to drop the message.

#### Link to tracking issue
#38164

#### Testing
Tested with a custom encoder and introducing bogus message to the topic.

#### Documentation
Added the configuration setting to the README
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 6, 2025
…pen-telemetry#39839)

#### Description
Introduce a setting to ignore errors when the configured encoder. It's
advised to set this to `true` when using
a custom encoder, and use `receiver.googlecloudpubsub.encoding_error`
metric to monitor the number of errors.
Ignoring the error will cause the receiver to drop the message.

#### Link to tracking issue
open-telemetry#38164

#### Testing
Tested with a custom encoder and introducing bogus message to the topic.

#### Documentation
Added the configuration setting to the README
vincentfree pushed a commit to ing-bank/opentelemetry-collector-contrib that referenced this issue May 20, 2025
…pen-telemetry#39839)

#### Description
Introduce a setting to ignore errors when the configured encoder. It's
advised to set this to `true` when using
a custom encoder, and use `receiver.googlecloudpubsub.encoding_error`
metric to monitor the number of errors.
Ignoring the error will cause the receiver to drop the message.

#### Link to tracking issue
open-telemetry#38164

#### Testing
Tested with a custom encoder and introducing bogus message to the topic.

#### Documentation
Added the configuration setting to the README
dragonlord93 pushed a commit to dragonlord93/opentelemetry-collector-contrib that referenced this issue May 23, 2025
…pen-telemetry#39839)

#### Description
Introduce a setting to ignore errors when the configured encoder. It's
advised to set this to `true` when using
a custom encoder, and use `receiver.googlecloudpubsub.encoding_error`
metric to monitor the number of errors.
Ignoring the error will cause the receiver to drop the message.

#### Link to tracking issue
open-telemetry#38164

#### Testing
Tested with a custom encoder and introducing bogus message to the topic.

#### Documentation
Added the configuration setting to the README
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/googlecloudpubsub
Projects
None yet
Development

No branches or pull requests

3 participants