Skip to content

[exporterhelper] A collector is eventually OOMKilled when one telemetry data is bigger than a batch max_size and the request sizer is bytes #12893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
at-ishikawa opened this issue Apr 18, 2025 · 2 comments · May be fixed by #12982
Labels
bug Something isn't working

Comments

@at-ishikawa
Copy link

Component(s)

exporter/exporterhelper

What happened?

Describe the bug

When an exporter has a sending_queue.sizer=bytes and sending_queue.batch.max_size, and one single telemetry data is bigger than sending_queue.batch.max_size, then it starts to consume memory indefinitely, and eventually the collector is OOMKilled.

Steps to reproduce

Set up a collector's configuration with an exporter that supports exporterhelper and include the following configuration.

    sending_queue:
      sizer: bytes
      batch:
        max_size: 10

Then run a collector and send telemetry data. For example, send a trace data by telemetrygen traces --otlp-insecure --traces 1 to OTLP receiver.

What did you expect to see?

Telemetry data should be dropped and output error logs.

What did you see instead?

I couldn't see anything. But CPU usages and memory usages increase, and the process is eventually OOM Killed.
I saw otelcol-dev process was OOMKilled by dmesg.

> sudo dmesg -T | egrep -i 'killed process|oom.kill'
[Fri Apr 18 12:37:16 2025] qemu-system-x86 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=200
[Fri Apr 18 12:37:16 2025]  oom_kill_process+0x118/0x280
[Fri Apr 18 12:37:16 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-281eff27-82db-4b5a-8dd7-c264f1daffe7.scope,task=otelcol-dev,pid=1542808,uid=2013
[Fri Apr 18 12:37:16 2025] Out of memory: Killed process 1542808 (otelcol-dev) total-vm:32036360kB, anon-rss:24105632kB, file-rss:516kB, shmem-rss:0kB, UID:2013 pgtables:55032kB oom_score_adj:100
[Fri Apr 18 13:14:50 2025] code invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=100
[Fri Apr 18 13:14:50 2025]  oom_kill_process+0x118/0x280
[Fri Apr 18 13:14:50 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-41f0b67f-58b9-4eab-b419-24031351ed25.scope,task=builder,pid=1663701,uid=2013
[Fri Apr 18 13:14:51 2025] Out of memory: Killed process 1663701 (builder) total-vm:27390444kB, anon-rss:23558260kB, file-rss:1500kB, shmem-rss:0kB, UID:2013 pgtables:49208kB oom_score_adj:100
[Fri Apr 18 13:25:41 2025] wpa_supplicant invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[Fri Apr 18 13:25:41 2025]  oom_kill_process+0x118/0x280
[Fri Apr 18 13:25:41 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=wpa_supplicant.service,mems_allowed=0,global_oom,task_memcg=/user.slice/user-2013.slice/[email protected]/app.slice/tmux-spawn-1c691201-4657-4fc1-a2fd-746d93188ea9.scope,task=builder,pid=1693988,uid=2013
[Fri Apr 18 13:25:41 2025] Out of memory: Killed process 1693988 (builder) total-vm:39965976kB, anon-rss:24978248kB, file-rss:532kB, shmem-rss:0kB, UID:2013 pgtables:65084kB oom_score_adj:100

Collector version

v0.124.0

Environment information

Environment

OS: Ubuntu 24.04
Compiler: go1.24.1, ocb: v0.124.0

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  otlp:
    endpoint: localhost:14317
    tls:
      insecure: true
    sending_queue:
      sizer: bytes
      queue_size: 10000
      batch:
        flush_timeout: 1s
        max_size: 10

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
  telemetry:
    logs:
      level: debug

Log output

go run ./otelcol-dev --config otel-collector-config.yaml
2025-04-18T11:52:52.486-0700    info    [email protected]/service.go:199 Setting up own telemetry...
2025-04-18T11:52:52.492-0700    debug   builders/builders.go:24 Stable component.
2025-04-18T11:52:52.496-0700    debug   builders/builders.go:24 Stable component.
2025-04-18T11:52:52.496-0700    debug   [email protected]/otlp.go:58        created signal-agnostic logger
2025-04-18T11:52:52.514-0700    info    [email protected]/service.go:266 Starting otelcol-dev... {"Version": "", "NumCPU": 12}
2025-04-18T11:52:52.514-0700    info    extensions/extensions.go:41     Starting extensions...
2025-04-18T11:52:52.515-0700    info    [email protected]/clientconn.go:176  [core] original dial target is: "localhost:14317"       {"grpc_log": true}
2025-04-18T11:52:52.517-0700    info    [email protected]/clientconn.go:459  [core] [Channel #1]Channel created      {"grpc_log": true}
2025-04-18T11:52:52.517-0700    info    [email protected]/clientconn.go:207  [core] [Channel #1]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/localhost:14317", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}   {"grpc_log": true}
2025-04-18T11:52:52.517-0700    info    [email protected]/clientconn.go:208  [core] [Channel #1]Channel authority set to "localhost:14317"   {"grpc_log": true}
2025-04-18T11:52:52.523-0700    info    [email protected]/resolver_wrapper.go:210    [core] [Channel #1]Resolver state updated: {
  "Addresses": [
    {
      "Addr": "localhost:14317",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Metadata": null
    }
  ],
  "Endpoints": [
    {
      "Addresses": [
        {
          "Addr": "localhost:14317",
          "ServerName": "",
          "Attributes": null,
          "BalancerAttributes": null,
          "Metadata": null
        }
      ],
      "Attributes": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)     {"grpc_log": true}
2025-04-18T11:52:52.525-0700    info    [email protected]/balancer_wrapper.go:122    [core] [Channel #1]Channel switches to new LB policy "pick_first"       {"grpc_log": true}
2025-04-18T11:52:52.526-0700    info    gracefulswitch/gracefulswitch.go:194    [pick-first-lb] [pick-first-lb 0xc000139f50] Received new config {
  "shuffleAddressList": false
}, resolver state {
  "Addresses": [
    {
      "Addr": "localhost:14317",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Metadata": null
    }
  ],
  "Endpoints": [
    {
      "Addresses": [
        {
          "Addr": "localhost:14317",
          "ServerName": "",
          "Attributes": null,
          "BalancerAttributes": null,
          "Metadata": null
        }
      ],
      "Attributes": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
}       {"grpc_log": true}
2025-04-18T11:52:52.527-0700    info    [email protected]/balancer_wrapper.go:195    [core] [Channel #1 SubChannel #2]Subchannel created     {"grpc_log": true}
2025-04-18T11:52:52.528-0700    info    [email protected]/clientconn.go:563  [core] [Channel #1]Channel Connectivity change to CONNECTING    {"grpc_log": true}
2025-04-18T11:52:52.529-0700    info    [email protected]/clientconn.go:364  [core] [Channel #1]Channel exiting idle mode    {"grpc_log": true}
2025-04-18T11:52:52.530-0700    info    [email protected]/server.go:690      [core] [Server #3]Server created        {"grpc_log": true}
2025-04-18T11:52:52.531-0700    info    [email protected]/otlp.go:116       Starting GRPC server    {"endpoint": "0.0.0.0:4317"}
2025-04-18T11:52:52.531-0700    info    [email protected]/service.go:289 Everything is ready. Begin running and processing data.
2025-04-18T11:52:52.532-0700    info    [email protected]/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING   {"grpc_log": true}
2025-04-18T11:52:52.532-0700    info    [email protected]/clientconn.go:1344 [core] [Channel #1 SubChannel #2]Subchannel picks a new address "localhost:14317" to connect    {"grpc_log": true}
2025-04-18T11:52:52.533-0700    info    [email protected]/server.go:886      [core] [Server #3 ListenSocket #4]ListenSocket created  {"grpc_log": true}
2025-04-18T11:52:52.533-0700    info    pickfirst/pickfirst.go:184      [pick-first-lb] [pick-first-lb 0xc000139f50] Received SubConn state update: 0xc00027e370, {ConnectivityState:CONNECTING ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}   {"grpc_log": true}
2025-04-18T11:52:52.548-0700    info    [email protected]/clientconn.go:1224 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to READY        {"grpc_log": true}
2025-04-18T11:52:52.548-0700    info    pickfirst/pickfirst.go:184      [pick-first-lb] [pick-first-lb 0xc000139f50] Received SubConn state update: 0xc00027e370, {ConnectivityState:READY ConnectionError:<nil> connectedAddress:{Addr:localhost:14317 ServerName:localhost:14317 Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}  {"grpc_log": true}
2025-04-18T11:52:52.548-0700    info    [email protected]/clientconn.go:563  [core] [Channel #1]Channel Connectivity change to READY {"grpc_log": true}
2025-04-18T11:52:55.995-0700    info    transport/http2_server.go:662   [transport] [server-transport 0xc0000f6000] Closing: read tcp 127.0.0.1:4317->127.0.0.1:46322: read: connection reset by peer  {"grpc_log": true}
2025-04-18T11:52:55.997-0700    info    transport/controlbuf.go:577     [transport] [server-transport 0xc0000f6000] loopyWriter exiting with error: transport closed by client  {"grpc_log": true}

Additional context

No response

@at-ishikawa at-ishikawa added the bug Something isn't working label Apr 18, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@iblancasa
Copy link
Contributor

I want to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants