Skip to content

GCP detector ignores context #1026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RonFed opened this issue Mar 6, 2025 · 2 comments
Open

GCP detector ignores context #1026

RonFed opened this issue Mar 6, 2025 · 2 comments
Assignees
Labels
bug Something isn't working priority: p1

Comments

@RonFed
Copy link

RonFed commented Mar 6, 2025

The gcp detector in the resourcesdetection processor ignores the context:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/58d93b20516223707ec8de05bd47f579c6ab03fc/processor/resourcedetectionprocessor/internal/gcp/gcp.go#L55

As a result the timeout configuration for the processor is not applied to metadata server queries in:

func (d *Detector) ProjectID() (string, error) {
// N.B. d.metadata.ProjectIDWithContext(context.TODO()) is cached globally, so if we use it here it's untestable.
s, err := d.metadata.GetWithContext(context.TODO(), "project/project-id")
return strings.TrimSpace(s), err
}
// instanceID returns the ID of the project in which this program is running.
func (d *Detector) instanceID() (string, error) {
// N.B. d.metadata.InstanceIDWithContext(context.TODO()) is cached globally, so if we use it here it's untestable.
s, err := d.metadata.GetWithContext(context.TODO(), "instance/id")
return strings.TrimSpace(s), err
}

As an example, having the following configuration:

      resourcedetection:
        detectors:
        - gcp
        timeout: 2s

results in 10 seconds init delay to the processor:

2025-03-06T07:45:06.697Z	info	internal/resourcedetection.go:137	began detecting resource information	{"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces"}
2025-03-06T07:45:16.750Z	info	internal/resourcedetection.go:188	detected resource information	{"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "traces", "otelcol.signal": "traces", "resource": {}}

In the above example it took 10 seconds for the processor to initialize, at this point the collector is not in Ready state which fails readiness probes. It looks like the time to initialize is not bounded and this can lead to timeout on the readiness probe which results in the collector not running and being in a CrashLoopBackoff state in k8s.

This happens in setups that don't run in GCP.

cc @damemi

RonFed added a commit to odigos-io/odigos that referenced this issue Mar 8, 2025
Create a workaround to handle
GoogleCloudPlatform/opentelemetry-operations-go#1026
figure out whether we're running on GKE at the startup of the autoscaler
(with a timeout of 2 seconds) - this should be removed once the issue
above is resolved and the collector dependency is updated.
In addition, the `resourcedetection` processor is updated to have a
timeout of 2 seconds.

## User Facing Changes

Not expected, users running on GKE should still see the resource
attributes.
@RonFed
Copy link
Author

RonFed commented May 24, 2025

Hi @dashpole, can you please confirm this should be fixed?
It looks like after googleapis/google-cloud-go#11786 this repo should also be updated?

@dashpole dashpole self-assigned this May 28, 2025
@dashpole dashpole added bug Something isn't working priority: p1 labels May 28, 2025
@dashpole
Copy link
Contributor

Sorry for the slow response, i'm just getting back. I am surprised it takes 10 seconds when running off gcp. I would think you would get a 404 much faster than that. We will have to find a way to do this without breaking backwards-compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: p1
Projects
None yet
Development

No branches or pull requests

2 participants