Throttle exporting from persistence queue to reduce memory consumption #11018

Nav-Kpbor · 2024-08-29T22:36:39Z

Is your feature request related to a problem? Please describe.
My team and I have encountered an issue where our collectors consume high memory usage when re-ingesting telemetry from a file storage queue after a disruption event with our backend. In these tests we have simulated an hour of connection failures to the backend to let our file storage queue grow. After an hour has passed, we restore the connection and see the spike in exported telemetry and memory usage.

Here is an example of the behavior we see from the persistence sending queue during the test period. Notice how the sending queue immediately drops to zero after reconnecting to the backend.

It seems like on reconnect with the backend, anything in the file storage queue gets consumed into a memory queue. We are hoping to control this memory spike so we can ensure memory won't pass a certain threshold when running on windows VMs

Describe the solution you'd like
Is there a feature we can add that will throttle how quickly the consumers pull from a file storage queue and send to the backend endpoint? Something that allows us to configure how many batches are pulled from the queue over a specified time frame?

Describe alternatives you've considered
We have tried utilizing the memory limiter and GOMEMLIMIT environment variable but neither have been successfully. My guess is the garbage collector won’t reclaim the memory since the telemetry is still being actively sent. We have also tried reducing the number of consumers and the size of batches but we are sill seeing the spiking.

Additional context
Collector version: v0.99 contrib
Tested on Windows 2016 server

Here is the config we used for testing in case there are any config changes we could make to improve memory usage with the current version of the collector.

extensions:
  health_check:
    endpoint: localhost:4313
  file_storage/backup:
    directory: {Directory of Collector on the machine}
    compaction:
      on_rebound: true
      directory: {Directory of Collector on the machine}
      rebound_needed_threshold_mib: 100
      rebound_trigger_threshold_mib: 10
      check_interval: 5s

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
  batch:
    send_batch_size: 8192
    send_batch_max_size: 8192
    timeout: 10s

exporters:   
  otlp:
    endpoint: "http://{IP of backend server}:4317"
    retry_on_failure:
      max_elapsed_time: 0
    sending_queue:
      queue_size: 1000
      storage: file_storage/backup
      num_consumers: 10
    tls:
      insecure: true
    

service:
  extensions: [health_check, file_storage/backup]
  telemetry:
    metrics:
      address: "localhost:4315"
      
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

jmacd · 2024-09-23T16:46:52Z

The num_consumers setting in the persistent queue is capable of throttling the export path, consider lowering it to 1 and working back up if the recovery is too slow.

mattsains · 2025-04-29T21:38:15Z

I think jmacd's message is a good workaround, but I think this feature request is valid, and relevant to the current improvements happening in exporterhelper. In addition, it shoe-horns nicely into this issue about creating rate/resource limiter extensions: #12603 It seems to me that either the persistent queue or the consumers in the exporter could be configured to interface with one of these limiters in the same way receivers would

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Throttle exporting from persistence queue to reduce memory consumption #11018

Throttle exporting from persistence queue to reduce memory consumption #11018

Nav-Kpbor commented Aug 29, 2024

jmacd commented Sep 23, 2024 •

edited

Loading

Uh oh!

mattsains commented Apr 29, 2025 •

edited

Loading

Uh oh!

Throttle exporting from persistence queue to reduce memory consumption #11018

Throttle exporting from persistence queue to reduce memory consumption #11018

Comments

Nav-Kpbor commented Aug 29, 2024

jmacd commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattsains commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmacd commented Sep 23, 2024 •

edited

Loading

mattsains commented Apr 29, 2025 •

edited

Loading