Skip to content

OTLP Exporter does not respect retry configuration when backend is unavailable #6588

@skhalash

Description

@skhalash

Description

When using the OpenTelemetry Go SDK with the OTLP gRPC exporter, we expect the client to retry failed connections using an exponential backoff strategy, as specified in the OTLP specification. However, in practice, this does not seem to be happening.

Environment

  • OS: OSX
  • Architecture: ARM
  • Go Version: 1.24.1
  • opentelemetry-go version: 1.35.0

Steps To Reproduce

Steps to Reproduce:

  1. Run the following Go program with an OTLP gRPC exporter configured with retries. To simplify the example, we are exporting only spans (exporting metrics and logs has the same behavior):
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func main() {
	ctx := context.Background()

	exp, err := newExporter(ctx)
	if err != nil {
		log.Fatalf("failed to initialize trace exporter: %v", err)
	}

	tp, err := newTracerProvider(exp)
	if err != nil {
		log.Fatalf("failed to initialize trace provider: %v", err)
	}

	defer func() { _ = tp.Shutdown(ctx) }()

	otel.SetTracerProvider(tp)
	generateSpan()

	select {}
}

func generateSpan() {
	log.Println("Generating a dummy span")
	_, span := otel.Tracer("").Start(context.Background(), "dummy")
	defer span.End()
}

func newTracerProvider(exp sdktrace.SpanExporter) (*sdktrace.TracerProvider, error) {
	r, err := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName("ExampleService"),
		),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create resource: %w", err)
	}

	return sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exp),
		sdktrace.WithResource(r),
	), nil
}

func newExporter(ctx context.Context) (*otlptrace.Exporter, error) {
	traceExporter, err := otlptrace.New(
		ctx,
		otlptracegrpc.NewClient(
			otlptracegrpc.WithEndpoint("127.0.0.1:4317"),
			otlptracegrpc.WithInsecure(),
			otlptracegrpc.WithRetry(otlptracegrpc.RetryConfig{
				Enabled:         true,
				InitialInterval: 1 * time.Second,
				MaxInterval:     30 * time.Second,
				MaxElapsedTime:  time.Minute,
			}),
		),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create trace exporter: %w", err)
	}

	return traceExporter, nil
}
  1. Ensure the OpenTelemetry Collector is NOT running.

  2. Run the program and observe the logs.

  3. Notice that there are no retries logged and it seem to give up after 15 seconds, even though there is a retry config specified:

2025/04/01 13:32:51 Generating a dummy span                                                                                                                                 
2025/04/01 13:33:06 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused"

The same behavior is observed with the otlphttp exporter as well as the other telemetry signals.

Expected behavior

If the OTLP backend is unavailable, the exporter should retry connecting based on the provided RetryConfig, implementing exponential backoff with jitter.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpkg:exporter:otlpRelated to the OTLP exporter package

Type

No type

Projects

Status

Low priority

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions