GCP detector ignores context #1026

RonFed · 2025-03-06T07:52:58Z

The gcp detector in the resourcesdetection processor ignores the context:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/58d93b20516223707ec8de05bd47f579c6ab03fc/processor/resourcedetectionprocessor/internal/gcp/gcp.go#L55

As a result the timeout configuration for the processor is not applied to metadata server queries in:

opentelemetry-operations-go/detectors/gcp/detector.go

Lines 72 to 83 in 19c4db6

    
           func (d *Detector) ProjectID() (string, error) { 
        
           	// N.B. d.metadata.ProjectIDWithContext(context.TODO()) is cached globally, so if we use it here it's untestable. 
        
           	s, err := d.metadata.GetWithContext(context.TODO(), "project/project-id") 
        
           	return strings.TrimSpace(s), err 
        
           } 
        
           // instanceID returns the ID of the project in which this program is running. 
        
           func (d *Detector) instanceID() (string, error) { 
        
           	// N.B. d.metadata.InstanceIDWithContext(context.TODO()) is cached globally, so if we use it here it's untestable. 
        
           	s, err := d.metadata.GetWithContext(context.TODO(), "instance/id") 
        
           	return strings.TrimSpace(s), err 
        
           }

As an example, having the following configuration:

      resourcedetection:
        detectors:
        - gcp
        timeout: 2s

results in 10 seconds init delay to the processor:

2025-03-06T07:45:06.697Z	info	internal/resourcedetection.go:137	began detecting resource information	{"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces"}
2025-03-06T07:45:16.750Z	info	internal/resourcedetection.go:188	detected resource information	{"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces", "resource": {}}

In the above example it took 10 seconds for the processor to initialize, at this point the collector is not in Ready state which fails readiness probes. It looks like the time to initialize is not bounded and this can lead to timeout on the readiness probe which results in the collector not running and being in a CrashLoopBackoff state in k8s.

This happens in setups that don't run in GCP.

cc @damemi

The text was updated successfully, but these errors were encountered:

Create a workaround to handle GoogleCloudPlatform/opentelemetry-operations-go#1026 figure out whether we're running on GKE at the startup of the autoscaler (with a timeout of 2 seconds) - this should be removed once the issue above is resolved and the collector dependency is updated. In addition, the `resourcedetection` processor is updated to have a timeout of 2 seconds. ## User Facing Changes Not expected, users running on GKE should still see the resource attributes.

RonFed · 2025-05-24T09:42:07Z

Hi @dashpole, can you please confirm this should be fixed?
It looks like after googleapis/google-cloud-go#11786 this repo should also be updated?

dashpole · 2025-05-28T18:13:05Z

Sorry for the slow response, i'm just getting back. I am surprised it takes 10 seconds when running off gcp. I would think you would get a 404 much faster than that. We will have to find a way to do this without breaking backwards-compatibility.

RonFed mentioned this issue Mar 6, 2025

GKE detect workaround odigos-io/odigos#2536

Merged

1 task

dashpole self-assigned this May 28, 2025

dashpole added bug Something isn't working priority: p1 labels May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GCP detector ignores context #1026

GCP detector ignores context #1026

RonFed commented Mar 6, 2025 •

edited

Loading

RonFed commented May 24, 2025

Uh oh!

dashpole commented May 28, 2025

Uh oh!

GCP detector ignores context #1026

GCP detector ignores context #1026

Comments

RonFed commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RonFed commented May 24, 2025

Uh oh!

dashpole commented May 28, 2025

Uh oh!

RonFed commented Mar 6, 2025 •

edited

Loading