Skip to content

[opentelemetry-kube-stack] target-allocator not assigning kubernetes_sd_config targets? #1649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aos opened this issue Apr 27, 2025 · 0 comments

Comments

@aos
Copy link

aos commented Apr 27, 2025

When using the opentelemetry-kube-stack chart, the provided daemon_scrape_configs.yaml contains a kubernetes-pods job. However, I am unable to make this work. Here's the simplest reproduction:

values.yaml

clusterName: demo
collectors:
  daemon:
    enabled: true
    targetAllocator:
      enabled: true
      image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
      allocationStrategy: per-node
      prometheusCR:
        enabled: true
        scrapeInterval: "30s"
    config:
      exporters:
        debug:
          verbosity: detailed
      service:
        telemetry:
          logs:
            level: debug
        pipelines:
          metrics:
            receivers: [prometheus]
            exporters: [debug]
    presets:
      logsCollection:
        enabled: false
      kubeletMetrics:
        enabled: false
      hostMetrics:
        enabled: false
      kubernetesAttributes:
        enabled: false
  cluster:
    enabled: false
instrumentation:
  enabled: false
opAMPBridge:
  enabled: false
kubernetesServiceMonitors:
  enabled: false
kubeApiServer:
  enabled: false
kubelet:
  enabled: false
kubeControllerManager:
  enabled: false
kubeDns:
  enabled: false
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeProxy:
  enabled: false
kubeStateMetrics:
  enabled: false
nodeExporter:
  enabled: false

I then set up some pods with the prometheus.io/scrape: "true" annotation and confirmed they are exposing metrics.

I checked the target allocator's /jobs:

$ kubectl port-forward -n observability svc/opentelemetry-kube-stack-daemon-targetallocator 8080:80
$ curl localhost:8080/jobs
{"node-exporter":{"_link":"/jobs/node-exporter/targets"}}

However, the targets are empty:

$ curl localhost:8080/jobs/node-exporter/targets
{
  "opentelemetry-kube-stack-daemon-collector-llfnl": {
    "_link": "/jobs/node-exporter/targets?collector_id=opentelemetry-kube-stack-daemon-collector-llfnl",
    "targets": []
  },
  "opentelemetry-kube-stack-daemon-collector-mvvf9": {
    "_link": "/jobs/node-exporter/targets?collector_id=opentelemetry-kube-stack-daemon-collector-mvvf9",
    "targets": []
  },
  "opentelemetry-kube-stack-daemon-collector-rrn78": {
    "_link": "/jobs/node-exporter/targets?collector_id=opentelemetry-kube-stack-daemon-collector-rrn78",
    "targets": []
  }
}

It is missing the kubelet and kubernetes-jobs jobs. The scrape config is correct though:

$ curl localhost:8080/scrape_configs
scrape_configs.json
{
  "kubelet": {
    "authorization": {
      "credentials_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
      "type": "Bearer"
    },
    "enable_compression": true,
    "enable_http2": true,
    "follow_redirects": true,
    "honor_labels": true,
    "honor_timestamps": true,
    "job_name": "kubelet",
    "kubernetes_sd_configs": [
      {
        "enable_http2": true,
        "follow_redirects": true,
        "kubeconfig_file": "",
        "role": "node",
        "selectors": [
          {
            "field": "metadata.name=${env:OTEL_K8S_NODE_NAME}",
            "role": "node"
          }
        ]
      }
    ],
    "metric_relabel_configs": [
      {
        "action": "drop",
        "regex": "container_cpu_(load_average_10s|system_seconds_total|user_seconds_total)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__name__"
        ]
      },
      {
        "action": "drop",
        "regex": "container_fs_(io_current|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__name__"
        ]
      },
      {
        "action": "drop",
        "regex": "container_memory_(mapped_file|swap)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__name__"
        ]
      },
      {
        "action": "drop",
        "regex": "container_(file_descriptors|tasks_state|threads_max)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__name__"
        ]
      },
      {
        "action": "drop",
        "regex": "container_spec.*",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__name__"
        ]
      },
      {
        "action": "drop",
        "regex": ".+;",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "id",
          "pod"
        ]
      }
    ],
    "metrics_path": "/metrics/cadvisor",
    "relabel_configs": [
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "job"
        ],
        "target_label": "__tmp_prometheus_job_name"
      },
      {
        "action": "replace",
        "replacement": "kubelet",
        "separator": ";",
        "target_label": "job"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_node_name"
        ],
        "target_label": "node"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "https-metrics",
        "separator": ";",
        "target_label": "endpoint"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__metrics_path__"
        ],
        "target_label": "metrics_path"
      },
      {
        "action": "hashmod",
        "modulus": 1,
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__address__"
        ],
        "target_label": "__tmp_hash"
      },
      {
        "action": "keep",
        "regex": "$(SHARD)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__tmp_hash"
        ]
      }
    ],
    "scheme": "https",
    "scrape_interval": "15s",
    "scrape_protocols": [
      "OpenMetricsText1.0.0",
      "OpenMetricsText0.0.1",
      "PrometheusText0.0.4"
    ],
    "scrape_timeout": "10s",
    "tls_config": {
      "ca_file": "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt",
      "insecure_skip_verify": true
    },
    "track_timestamps_staleness": false
  },
  "kubernetes-pods": {
    "authorization": {
      "credentials_file": "/var/run/secrets/kubernetes.io/serviceaccount/token",
      "type": "Bearer"
    },
    "enable_compression": true,
    "enable_http2": true,
    "follow_redirects": true,
    "honor_timestamps": true,
    "job_name": "kubernetes-pods",
    "kubernetes_sd_configs": [
      {
        "enable_http2": true,
        "follow_redirects": true,
        "kubeconfig_file": "",
        "role": "pod",
        "selectors": [
          {
            "field": "spec.nodeName=${env:OTEL_K8S_NODE_NAME}",
            "role": "pod"
          }
        ]
      }
    ],
    "metrics_path": "/metrics",
    "relabel_configs": [
      {
        "action": "keep",
        "regex": "true",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_annotation_prometheus_io_scrape"
        ]
      },
      {
        "action": "drop",
        "regex": "true",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow"
        ]
      },
      {
        "action": "replace",
        "regex": "(https?)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_annotation_prometheus_io_scheme"
        ],
        "target_label": "__scheme__"
      },
      {
        "action": "replace",
        "regex": "(.+)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_annotation_prometheus_io_path"
        ],
        "target_label": "__metrics_path__"
      },
      {
        "action": "replace",
        "regex": "([^:]+)(?::\\d+)?;(\\d+)",
        "replacement": "$1:$2",
        "separator": ";",
        "source_labels": [
          "__address__",
          "__meta_kubernetes_pod_annotation_prometheus_io_port"
        ],
        "target_label": "__address__"
      },
      {
        "action": "labelmap",
        "regex": "__meta_kubernetes_pod_annotation_prometheus_io_param_(.+)",
        "replacement": "__param_$1",
        "separator": ";"
      },
      {
        "action": "labelmap",
        "regex": "__meta_kubernetes_pod_label_(.+)",
        "replacement": "$1",
        "separator": ";"
      },
      {
        "action": "replace",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_namespace"
        ],
        "target_label": "namespace"
      },
      {
        "action": "replace",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_name"
        ],
        "target_label": "pod"
      },
      {
        "action": "drop",
        "regex": "Pending|Succeeded|Failed|Completed",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_phase"
        ]
      },
      {
        "action": "replace",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_label_app_kubernetes_io_name"
        ],
        "target_label": "job"
      }
    ],
    "scheme": "http",
    "scrape_interval": "30s",
    "scrape_protocols": [
      "OpenMetricsText1.0.0",
      "OpenMetricsText0.0.1",
      "PrometheusText0.0.4"
    ],
    "scrape_timeout": "10s",
    "track_timestamps_staleness": false
  },
  "node-exporter": {
    "enable_compression": true,
    "enable_http2": true,
    "follow_redirects": true,
    "honor_timestamps": true,
    "job_name": "node-exporter",
    "metrics_path": "/metrics",
    "relabel_configs": [
      {
        "action": "labelmap",
        "regex": "__meta_kubernetes_node_label_(.+)",
        "replacement": "$1",
        "separator": ";"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "job"
        ],
        "target_label": "__tmp_prometheus_job_name"
      }
    ],
    "scheme": "http",
    "scrape_interval": "30s",
    "scrape_protocols": [
      "OpenMetricsText1.0.0",
      "OpenMetricsText0.0.1",
      "PrometheusText0.0.4"
    ],
    "scrape_timeout": "10s",
    "static_configs": [
      {
        "targets": [
          "${env:OTEL_K8S_NODE_IP}:9100"
        ]
      }
    ],
    "track_timestamps_staleness": false
  }
}

Here are logs from the target allocator. I do see this error:

"error":"could not find collector for node "

I dug into the code but couldn't figure out why it can't get the name of the node.

Target allocator logs
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Starting the Target Allocator"}
{"level":"info","ts":"2025-04-27T07:49:38Z","logger":"allocator","msg":"Starting server..."}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Waiting for caches to sync for namespace"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Caches are synced for namespace"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Waiting for caches to sync for servicemonitors"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Caches are synced for servicemonitors"}                                                                                  {"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Waiting for caches to sync for podmonitors"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Caches are synced for podmonitors"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Waiting for caches to sync for probes"}                                                                                  {"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Caches are synced for probes"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Waiting for caches to sync for scrapeconfigs"}
{"level":"info","ts":"2025-04-27T07:49:38Z","msg":"Caches are synced for scrapeconfigs"}
{"level":"info","ts":"2025-04-27T07:49:43Z","logger":"allocator","msg":"Service Discovery watch event received","targets groups":3}
{"level":"info","ts":"2025-04-27T07:49:43Z","logger":"allocator","msg":"Could not assign targets for some jobs","allocator":"per-node","targets":1,"error":"could not find collector for node "}
{"level":"info","ts":"2025-04-27T07:49:48Z","logger":"allocator","msg":"Service Discovery watch event received","targets groups":3}
{"level":"info","ts":"2025-04-27T07:53:23Z","logger":"allocator","msg":"Could not assign targets for some jobs","allocator":"per-node","targets":1,"error":"could not find collector for node "}

What am I missing to get this working? The helm chart creates the appropriate service accounts and cluster role + clusterrolebindings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants