Skip to content

Ingestion of metrics or logs very slow (and high CPU usage on collector) #33427

@rpasche

Description

@rpasche

Component(s)

cmd/telemetrygen

Describe the issue you're reporting

Hi,

I am running the otel collector-contrib version 0.100.0 with this configuration

exporters:
  otlp/apm:
    endpoint: server:1234
    headers:
      Authorization: Bearer ${env:SECRET_TOKEN}
    tls:
      ca_file: /certs/ca.pem
      cert_file: /certs/tls.crt
      insecure_skip_verify: true
      key_file: /certs/tls.key
extensions:
  basicauth/server:
    htpasswd:
      file: /basicauth/.htpasswd
  health_check: {}
  zpages: {}
processors:
  attributes/tenantid:
    actions:
    - action: upsert
      key: tenant.id
      value: ${env:TENANT_ID}
  batch: {}
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 19
receivers:
  otlp:
    protocols:
      grpc:
        auth:
          authenticator: basicauth/server
        endpoint: ${MY_POD_IP}:4317
        tls:
          ca_file: /certs/ca.pem
          cert_file: /certs/tls.crt
          key_file: /certs/tls.key
      http:
        auth:
          authenticator: basicauth/server
        endpoint: ${MY_POD_IP}:4318
        tls:
          ca_file: /certs/ca.pem
          cert_file: /certs/tls.crt
          key_file: /certs/tls.key
service:
  extensions:
  - health_check
  - zpages
  - basicauth/server
  pipelines:
    logs:
      exporters:
      - otlp/apm
      processors:
      - memory_limiter
      - attributes/tenantid
      - batch
      receivers:
      - otlp
      - syslog
      - tcplog
    metrics:
      exporters:
      - otlp/apm
      processors:
      - memory_limiter
      - attributes/tenantid
      - batch
      receivers:
      - otlp
    traces:
      exporters:
      - otlp/apm
      processors:
      - memory_limiter
      - attributes/tenantid
      - batch
      receivers:
      - otlp
  telemetry:
    logs:
      encoding: json
      level: debug
    metrics:
      address: 0.0.0.0:8888
      level: detailed

I'm running this collector in a k8s POD (one container only) and the collector has 2 CPU and 4 GB memory attached.

When running the telemetry-gen command to send 1000 traces, this finishes within 1 second.

If I run - basically - the same command to send either 1000 logs or metrics, this takes ~ 2 - 3 minutes to complete. Looking into Grafana and the collector dashboard only ~ 20 metrics / s or only ~ 8 logs / s are received on the collector.

Additionally, the CPU load when sending metrics or logs goes very high. It is around 60%.

Compared to this, running a trace send for 5m with 1 worker, the CPU usage on the collector also goes to ~ 60 %, but it's processing ~ 4k spans/s.

All time, there is no queuing shown within the collector.

Example commands I ran

~/go/bin/telemetrygen traces --otlp-endpoint otlp-endpoint:4317 --otlp-header Authorization=\"Basic\ base64...\" --service=my-custom-service --duration=5m --rate=0

Similar commands were used to send 1000 metrics or logs (--metrics 1000 or --logs 1000 without setting the --duration)

See some screenshots while trying to send 1000 logs to the collector

Receiver:
image

Processor:
image

Exporters:
image

Collector stats:
image

Any hint what is going on here or what I am doing wrong?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions