Skip to content

Make OTLP exporter retry parameters configurable in JS so connect failures (e.g., ECONNREFUSED) can be retried for minutes #6220

@vinayvennela

Description

@vinayvennela

Summary

When the OTLP endpoint is temporarily unavailable (e.g., Collector restarts) NodeJS apps often hit connect‑level failures such as ECONNREFUSED. Today, the JS OTLP exporters perform a small, hard‑coded exponential backoff retry and then drop the batch. There’s no way to extend the retry window (e.g., a couple of minutes) from application code, which leads to preventable data loss during short outages.

I’m proposing we expose the retry parameters (max attempts, backoff bounds, multiplier, jitter) as configuration on the JS OTLP exporters (HTTP/proto, HTTP/json, and ideally gRPC) so users can tune resiliency to their environments.

References:

  • Retry constants live in otlp-exporter-base: experimental/packages/otlp-exporter-base/src/retrying-transport.ts (see constants around lines 20–24: MAX_ATTEMPTS = 5, INITIAL_BACKOFF = 1000, MAX_BACKOFF = 5000, etc.)
  • Exporters are responsible for retry per the spec
  • Current package docs show no way to customize retry for JS OTLP exporters.

Problem details

  • In NodeJS, when the OTLP endpoint is briefly down/unreachable, the exporter emits connect errors (e.g., ECONNREFUSED).
  • The exporter’s built‑in retry logic uses fixed constants (max 5 attempts, 1s initial backoff, 5s max backoff) and then drops the batch if it cannot connect. Users cannot raise these limits. [\

This behavior contradicts operational expectations for short Collector outages, especially when apps cannot guarantee startup ordering or run outside k8s.


Expected behavior

Provide a way to retry for minutes (operator‑tuned), not seconds, on transient network/connect errors—without requiring an external wrapper. For example:

const exporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
  retry: {
    maxAttempts: 30,
    initialBackoffMs: 1000,
    maxBackoffMs: 30000,  // 30s
    multiplier: 1.5,
    jitter: 0.2,
    // optionally: totalElapsedLimitMs or per-batch cap
  },
});

Environment variable support would also be helpful (aligned to the env var spec), e.g.:

OTEL_EXPORTER_OTLP_RETRY_MAX_ATTEMPTS=30
OTEL_EXPORTER_OTLP_RETRY_INITIAL_BACKOFF_MS=1000
OTEL_EXPORTER_OTLP_RETRY_MAX_BACKOFF_MS=30000
OTEL_EXPORTER_OTLP_RETRY_MULTIPLIER=1.5
OTEL_EXPORTER_OTLP_RETRY_JITTER=0.2

Actual behavior

Retry parameters are hard‑coded in retrying-transport.ts and not user‑configurable; batches are dropped after a short backoff window on ECONNREFUSED.


Proposed solution

  1. Expose retry configuration on the OTLP exporters (constructor options + env vars), wired through otlp-exporter-base.
  2. Maintain sane defaults equivalent to current constants, preserving backward compatibility.

Alternatives considered

  • Using Collector as an agent/side car to ensure max availability

Risks / considerations

  • Longer retries can increase memory pressure in the app; defaults should remain conservative.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions