-
Notifications
You must be signed in to change notification settings - Fork 523
[collector] collector pod evicted due to being unhealthy (health_check extension enabled) #3825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to be the same issue as #3688. A possible workaround can be found here: https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/examples/deployment-only/rendered/configmap.yaml#L19-L21 |
you are right.. it's the same, I had already figure it out that could be the address binding though... I will wait for the fix. Thanks! |
Hi @yuriolisa, I have updated the opentelemetry-operator helm chart to the latest version but I see the same behaviour (the chart version is This is my collector definition (it's managed from terraform hence the "$$" escape and "${}" interpolation): apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel
namespace: "${local.namespace}"
spec:
mode: deployment
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "environment"
operator: In
values:
- "${var.name}"
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
config:
receivers:
otlp:
protocols:
grpc:
endpoint: $${env:POD_IP}:4317
http:
endpoint: $${env:POD_IP}:4318
extensions:
health_check: {}
processors:
batch: {}
exporters:
otlp:
endpoint: tempo.monitoring:4317
tls:
insecure: true
service:
extensions:
- health_check
pipelines:
traces:
receivers:
- otlp
processors:
- batch
exporters:
- otlp |
Hi @yuriolisa , any update? I do not have the permission to reopen this issue |
Uh oh!
There was an error while loading. Please reload this page.
Component(s)
collector
What happened?
Description
The opentelemetry-collector pod seems restarted due to not being healthy (health_check extension enabled).
Steps to Reproduce
grafana/tempo
chart at version1.18.2
opentelemetry-operator
helm chart at the specified versionEverything is ready. Begin running and processing data.
curl localhost:13133/
. It returns 200 OK:{"status":"Server available","upSince":"2025-03-19T17:48:23.084107432Z","uptime":"14.231745165s"}
Received signal from OS {"signal": "terminated"}
Expected Result
The collector pod should become healthy because the probes (that seems to be correctly configured to make a HTTP GET request to :13133/) return 200 OK.
Actual Result
The collector pod goes into CrashLoopBackOff because it is terminated by the cluster too many times even if the probes seem to return 200 OK.
The conditions of the collector deployment are:
Kubernetes Version
1.31.4
Operator version
0.83.0
Collector version
0.120.0
Environment information
Environment
OS: (e.g., "Ubuntu 24.04.1 LTS")
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: