Skip to content

[otelcol] Collector.Shutdown may be a no-op during config reload #13740

@axw

Description

@axw

Component(s)

No response

What happened?

Describe the bug

If the Collector.Shutdown method is called during a config reload, the collector's state will briefly be "closing" and the Shutdown method will be a no-op.

This explains https://github.com/open-telemetry/opentelemetry-collector/actions/runs/17367754316/job/49297665087?pr=13739

Steps to reproduce

This diff demonstrates the issue:

```yaml
diff --git a/otelcol/collector.go b/otelcol/collector.go
index 12a27685c..2730b17da 100644
--- a/otelcol/collector.go
+++ b/otelcol/collector.go
@@ -14,6 +14,7 @@ import (
 	"os/signal"
 	"sync/atomic"
 	"syscall"
+	"time"
 
 	"go.uber.org/multierr"
 	"go.uber.org/zap"
@@ -250,6 +251,7 @@ func (col *Collector) setupConfigurationComponents(ctx context.Context) error {
 func (col *Collector) reloadConfiguration(ctx context.Context) error {
 	col.service.Logger().Warn("Config updated, restart service")
 	col.setCollectorState(StateClosing)
+	time.Sleep(time.Second)
 
 	if err := col.service.Shutdown(ctx); err != nil {
 		return fmt.Errorf("failed to shutdown the retiring config: %w", err)
diff --git a/otelcol/collector_test.go b/otelcol/collector_test.go
index f4063b2b5..377fe6c39 100644
--- a/otelcol/collector_test.go
+++ b/otelcol/collector_test.go
@@ -107,7 +107,7 @@ func TestCollectorStateAfterConfigChange(t *testing.T) {
 	watcher(&confmap.ChangeEvent{})
 
 	assert.Eventually(t, func() bool {
-		return StateRunning == col.GetState()
+		return StateClosing == col.GetState()
 	}, 2*time.Second, 200*time.Millisecond)

What did you expect to see?

I expect the test to pass reliably.

What did you see instead?

Flaky test.

This is not just a flaky test though, the bug could affect an actual real life collector. Specifically on Windows, the service logic does involve calling the Shutdown method which means the Run method may not return.

Collector version

76862b8

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions