Skip to content

Per process hostmetrics are not scraped #18232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EdwardKuenen opened this issue Feb 1, 2023 · 10 comments
Closed

Per process hostmetrics are not scraped #18232

EdwardKuenen opened this issue Feb 1, 2023 · 10 comments
Labels
bug Something isn't working receiver/hostmetrics

Comments

@EdwardKuenen
Copy link

EdwardKuenen commented Feb 1, 2023

Component(s)

receiver/hostmetrics

What happened?

Description

When using the hostmetrics receiver with the process scraper the per process metrics are not scraped.

Steps to Reproduce

  • Download otel-contrib_0.70.0_linux_amd64.tar.gz
  • Extract the zip
  • Create a config file
  • Start the collector, ./otelcol-contrib --config otelcol.yml
  • Wait a few minutes to collect metrics curl http://localhost:1234/metrics

Expected Result

No errors
CPU usage per process in the metrics

Actual Result

Errors in the output of the collector.
Only cumulative counters.

Collector version

v0.70.0, v0.72.0

Environment information

Environment

OS: "Ubuntu 22.10", "CentOS-8"

OpenTelemetry Collector configuration

receivers:
  hostmetrics:
    scrapers:
      process:
        metrics:
          process.cpu.utilization:
            enabled: true
        mute_process_name_error: false


exporters:
  prometheus:
    endpoint: 0.0.0.0:1234

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [prometheus]

Log output

root@ubuntu2210:/opt/collector# ./otelcol-contrib --config=otelcol.yml
2023-02-01T14:28:05.115Z        info    service/telemetry.go:90 Setting up own telemetry...
2023-02-01T14:28:05.116Z        info    service/telemetry.go:116        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-02-01T14:28:05.119Z        info    service/service.go:128  Starting otelcol-contrib...     {"Version": "0.70.0", "NumCPU": 2}
2023-02-01T14:28:05.121Z        info    extensions/extensions.go:41     Starting extensions...
2023-02-01T14:28:05.122Z        info    service/pipelines.go:86 Starting exporters...
2023-02-01T14:28:05.123Z        info    service/pipelines.go:90 Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2023-02-01T14:28:05.124Z        warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "exporter", "data_type": "metrics", "name": "prometheus", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-02-01T14:28:05.124Z        info    service/pipelines.go:94 Exporter started.       {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2023-02-01T14:28:05.124Z        info    service/pipelines.go:98 Starting processors...
2023-02-01T14:28:05.125Z        info    service/pipelines.go:110        Starting receivers...
2023-02-01T14:28:05.126Z        info    service/pipelines.go:114        Receiver is starting... {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2023-02-01T14:28:05.127Z        info    service/pipelines.go:118        Receiver started.       {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2023-02-01T14:28:05.127Z        info    service/service.go:145  Everything is ready. Begin running and processing data.
2023-02-01T14:29:05.183Z        error   scraperhelper/scrapercontroller.go:212  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0; error reading process name for pid 2: readlink /proc/2/exe: no such file or directory; error reading process name for pid 3: readlink /proc/3/exe: no such file or directory; error reading process name for pid 4: readlink /proc/4/exe: no such file or directory; error reading process name for pid 5: readlink /proc/5/exe: no such file or directory; error reading process name for pid 7: readlink /proc/7/exe: no such file or directory; error reading process name for pid 9: readlink /proc/9/exe: no such file or directory; error reading process name for pid 10: readlink /proc/10/exe: no such file or directory; error reading process name for pid 11: readlink /proc/11/exe: no such file or directory; error reading process name for pid 12: readlink /proc/12/exe: no such file or directory; error reading process name for pid 13: readlink /proc/13/exe: no such file or directory; error reading process name for pid 14: readlink /proc/14/exe: no such file or directory; error reading process name for pid 15: readlink /proc/15/exe: no such file or directory; error reading process name for pid 16: readlink /proc/16/exe: no such file or directory; error reading process name for pid 17: readlink /proc/17/exe: no such file or directory; error reading process name for pid 19: readlink /proc/19/exe: no such file or directory; error reading process name for pid 20: readlink /proc/20/exe: no such file or directory; error reading process name for pid 21: readlink /proc/21/exe: no such file or directory; error reading process name for pid 22: readlink /proc/22/exe: no such file or directory; error reading process name for pid 23: readlink /proc/23/exe: no such file or directory; error reading process name for pid 25: readlink /proc/25/exe: no such file or directory; error reading process name for pid 26: readlink /proc/26/exe: no such file or directory; error reading process name for pid 27: readlink /proc/27/exe: no such file or directory; error reading process name for pid 28: readlink /proc/28/exe: no such file or directory; error reading process name for pid 29: readlink /proc/29/exe: no such file or directory; error reading process name for pid 31: readlink /proc/31/exe: no such file or directory; error reading process name for pid 33: readlink /proc/33/exe: no such file or directory; error reading process name for pid 34: readlink /proc/34/exe: no such file or directory; error reading process name for pid 35: readlink /proc/35/exe: no such file or directory; error reading process name for pid 36: readlink /proc/36/exe: no such file or directory; error reading process name for pid 37: readlink /proc/37/exe: no such file or directory; error reading process name for pid 38: readlink /proc/38/exe: no such file or directory; error reading process name for pid 39: readlink /proc/39/exe: no such file or directory; error reading process name for pid 40: readlink /proc/40/exe: no such file or directory; error reading process name for pid 41: readlink /proc/41/exe: no such file or directory; error reading process name for pid 42: readlink /proc/42/exe: no such file or directory; error reading process name for pid 43: readlink /proc/43/exe: no such file or directory; error reading process name for pid 44: readlink /proc/44/exe: no such file or directory; error reading process name for pid 45: readlink /proc/45/exe: no such file or directory; error reading process name for pid 47: readlink /proc/47/exe: no such file or directory; error reading process name for pid 48: readlink /proc/48/exe: no such file or directory; error reading process name for pid 55: readlink /proc/55/exe: no such file or directory; error reading process name for pid 60: readlink /proc/60/exe: no such file or directory; error reading process name for pid 61: readlink /proc/61/exe: no such file or directory; error reading process name for pid 62: readlink /proc/62/exe: no such file or directory; error reading process name for pid 63: readlink /proc/63/exe: no such file or directory; error reading process name for pid 64: readlink /proc/64/exe: no such file or directory; error reading process name for pid 65: readlink /proc/65/exe: no such file or directory; error reading process name for pid 66: readlink /proc/66/exe: no such file or directory; error reading process name for pid 67: readlink /proc/67/exe: no such file or directory; error reading process name for pid 68: readlink /proc/68/exe: no such file or directory; error reading process name for pid 69: readlink /proc/69/exe: no such file or directory; error reading process name for pid 74: readlink /proc/74/exe: no such file or directory; error reading process name for pid 80: readlink /proc/80/exe: no such file or directory; error reading process name for pid 81: readlink /proc/81/exe: no such file or directory; error reading process name for pid 127: readlink /proc/127/exe: no such file or directory; error reading process name for pid 181: readlink /proc/181/exe: no such file or directory; error reading process name for pid 187: readlink /proc/187/exe: no such file or directory; error reading process name for pid 189: readlink /proc/189/exe: no such file or directory; error reading process name for pid 217: readlink /proc/217/exe: no such file or directory; error reading process name for pid 243: readlink /proc/243/exe: no such file or directory; error reading process name for pid 288: readlink /proc/288/exe: no such file or directory; error reading process name for pid 289: readlink /proc/289/exe: no such file or directory; error reading process name for pid 381: readlink /proc/381/exe: no such file or directory; error reading process name for pid 384: readlink /proc/384/exe: no such file or directory; error reading process name for pid 385: readlink /proc/385/exe: no such file or directory; error reading process name for pid 387: readlink /proc/387/exe: no such file or directory; error reading process name for pid 505: readlink /proc/505/exe: no such file or directory; error reading process name for pid 506: readlink /proc/506/exe: no such file or directory; error reading process name for pid 569: readlink /proc/569/exe: no such file or directory; error reading process name for pid 2342: readlink /proc/2342/exe: no such file or directory; error reading process name for pid 2614: readlink /proc/2614/exe: no such file or directory; error reading process name for pid 2627: readlink /proc/2627/exe: no such file or directory; error reading process name for pid 2638: readlink /proc/2638/exe: no such file or directory; error reading process name for pid 2639: readlink /proc/2639/exe: no such file or directory; error reading process name for pid 2917: readlink /proc/2917/exe: no such file or directory; error reading process name for pid 3074: readlink /proc/3074/exe: no such file or directory; error reading process name for pid 3075: readlink /proc/3075/exe: no such file or directory", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:212
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:191
2023-02-01T14:30:05.210Z        error   scraperhelper/scrapercontroller.go:212  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"syst...

Additional context

Output metrics

vagrant@ubuntu2210:/opt/collector$ curl http://localhost:1234/metrics
# HELP process_cpu_time Total CPU seconds broken down by different states.
# TYPE process_cpu_time counter
process_cpu_time{state="system"} 0.56
process_cpu_time{state="user"} 0.2
process_cpu_time{state="wait"} 0
# HELP process_cpu_utilization Percentage of total CPU time used by the process since last scrape, expressed as a value between 0 and 1. On the first scrape, no data point is emitted for this metric.
# TYPE process_cpu_utilization gauge
process_cpu_utilization{state="system"} 3407.9196728397114
process_cpu_utilization{state="user"} 997.4399042457693
process_cpu_utilization{state="wait"} 0
# HELP process_disk_io Disk bytes transferred.
# TYPE process_disk_io counter
process_disk_io{direction="read"} 0
process_disk_io{direction="write"} 0
# HELP process_memory_usage The amount of physical memory in use.
# TYPE process_memory_usage gauge
process_memory_usage 1.22888192e+08
# HELP process_memory_virtual Virtual memory size.
# TYPE process_memory_virtual gauge
process_memory_virtual 9.19764992e+08
vagrant@ubuntu2210:/opt/collector$
@EdwardKuenen EdwardKuenen added bug Something isn't working needs triage New item requiring triage labels Feb 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Feb 1, 2023

@EdwardKuenen please reformat the issue to use a block of code so we can read the prometheus output, thanks!

@sumo-drosiek
Copy link
Member

sumo-drosiek commented Feb 22, 2023

+1 for this issue

If /proc/[pid]/exe is invalid symlink, process is omitted:

executable, err := getProcessExecutable(handle)
if err != nil {
if !s.config.MuteProcessNameError {
errs.AddPartial(1, fmt.Errorf("error reading process name for pid %v: %w", pid, err))
}
continue
}

This happens for all processes which do not have actually real executable (usually it is the case of system processes and/or kernel threads)

ps aux

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 168536 13100 ?        Ss   07:46   0:00 /sbin/init
root           2  0.0  0.0      0     0 ?        S    07:46   0:00 [kthreadd]
[root@avalon ~]# readlink /proc/1/exe
/usr/lib/systemd/systemd
[root@avalon ~]# readlink /proc/2/exe
[root@avalon ~]# ls -al /proc/1/exe 
lrwxrwxrwx 1 root root 0 Feb 22 07:46 /proc/1/exe -> /usr/lib/systemd/systemd
[root@avalon ~]# ls -al /proc/2/exe 
ls: cannot read symbolic link '/proc/2/exe': No such file or directory
lrwxrwxrwx 1 root root 0 Feb 22 08:12 /proc/2/exe
[root@avalon ~]# ls -al /proc/2
ls: cannot read symbolic link '/proc/2/exe': No such file or directory
total 0
dr-xr-xr-x   9 root root 0 Feb 22 07:46 .
dr-xr-xr-x 417 root root 0 Feb 22 07:46 ..
-r--r--r--   1 root root 0 Feb 22 13:00 arch_status
dr-xr-xr-x   2 root root 0 Feb 22 13:00 attr
-rw-r--r--   1 root root 0 Feb 22 13:00 autogroup
-r--------   1 root root 0 Feb 22 13:00 auxv
-r--r--r--   1 root root 0 Feb 22 13:00 cgroup
--w-------   1 root root 0 Feb 22 13:00 clear_refs
-r--r--r--   1 root root 0 Feb 22 08:12 cmdline
-rw-r--r--   1 root root 0 Feb 22 12:38 comm
-rw-r--r--   1 root root 0 Feb 22 13:00 coredump_filter
-r--r--r--   1 root root 0 Feb 22 13:00 cpu_resctrl_groups
-r--r--r--   1 root root 0 Feb 22 13:00 cpuset
lrwxrwxrwx   1 root root 0 Feb 22 13:00 cwd -> /
-r--------   1 root root 0 Feb 22 13:00 environ
lrwxrwxrwx   1 root root 0 Feb 22 08:12 exe
dr-x------   2 root root 0 Feb 22 13:00 fd
dr-xr-xr-x   2 root root 0 Feb 22 13:00 fdinfo
-rw-r--r--   1 root root 0 Feb 22 13:00 gid_map
-r--------   1 root root 0 Feb 22 13:00 io
-r--------   1 root root 0 Feb 22 13:00 ksm_merging_pages
-r--------   1 root root 0 Feb 22 13:00 ksm_stat
-r--r--r--   1 root root 0 Feb 22 13:00 latency
-r--r--r--   1 root root 0 Feb 22 13:00 limits
-rw-r--r--   1 root root 0 Feb 22 13:00 loginuid
dr-x------   2 root root 0 Feb 22 13:00 map_files
-r--r--r--   1 root root 0 Feb 22 13:00 maps
-rw-------   1 root root 0 Feb 22 13:00 mem
-r--r--r--   1 root root 0 Feb 22 13:00 mountinfo
-r--r--r--   1 root root 0 Feb 22 13:00 mounts
-r--------   1 root root 0 Feb 22 13:00 mountstats
dr-xr-xr-x  61 root root 0 Feb 22 13:00 net
dr-x--x--x   2 root root 0 Feb 22 13:00 ns
-r--r--r--   1 root root 0 Feb 22 13:00 numa_maps
-rw-r--r--   1 root root 0 Feb 22 13:00 oom_adj
-r--r--r--   1 root root 0 Feb 22 13:00 oom_score
-rw-r--r--   1 root root 0 Feb 22 13:00 oom_score_adj
-r--------   1 root root 0 Feb 22 13:00 pagemap
-r--------   1 root root 0 Feb 22 13:00 personality
-rw-r--r--   1 root root 0 Feb 22 13:00 projid_map
lrwxrwxrwx   1 root root 0 Feb 22 13:00 root -> /
-rw-r--r--   1 root root 0 Feb 22 13:00 sched
-r--r--r--   1 root root 0 Feb 22 13:00 schedstat
-r--r--r--   1 root root 0 Feb 22 13:00 sessionid
-rw-r--r--   1 root root 0 Feb 22 13:00 setgroups
-r--r--r--   1 root root 0 Feb 22 13:00 smaps
-r--r--r--   1 root root 0 Feb 22 13:00 smaps_rollup
-r--------   1 root root 0 Feb 22 13:00 stack
-r--r--r--   1 root root 0 Feb 22 07:46 stat
-r--r--r--   1 root root 0 Feb 22 13:00 statm
-r--r--r--   1 root root 0 Feb 22 08:12 status
-r--------   1 root root 0 Feb 22 13:00 syscall
dr-xr-xr-x   3 root root 0 Feb 22 13:00 task
-rw-r--r--   1 root root 0 Feb 22 13:00 timens_offsets
-r--r--r--   1 root root 0 Feb 22 13:00 timers
-rw-rw-rw-   1 root root 0 Feb 22 13:00 timerslack_ns
-rw-r--r--   1 root root 0 Feb 22 13:00 uid_map
-r--r--r--   1 root root 0 Feb 22 13:00 wchan

@sumo-drosiek
Copy link
Member

sumo-drosiek commented Feb 22, 2023

@dmitryax Do you think we could just use empty path (instead of raising an error) for processes we cannot get valid executable? In that case, I'm getting the following errors:

error reading parent pid for process \"systemd\" (pid 1): invalid pid 0;
error reading parent pid for process \"kthreadd\" (pid 2): invalid pid 0;
error reading disk usage for process \"systemd\" (pid 1): open /proc/1/io: permission denied;
error reading disk usage for process \"kthreadd\" (pid 2): open /proc/2/io: permission denied;
error reading disk usage for process \"rcu_gp\" (pid 3): open /proc/3/io: permission denied;
error reading disk usage for process \"rcu_par_gp\" (pid 4): open /proc/4/io: permission denied;

Different question is do we have to log these errors if this is kind of expected that some processes behaves differently than regular ones.

@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 8, 2023
@EdwardKuenen
Copy link
Author

Any updates on this issue?

@github-actions github-actions bot removed the Stale label May 30, 2023
@jskiba
Copy link
Contributor

jskiba commented Jul 12, 2023

@EdwardKuenen you can use configuration options listed here https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#process to mute these errors

these especially

mute_process_exe_error: <true|false>
mute_process_io_error: <true|false>

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 11, 2023
@andrzej-stencel
Copy link
Member

@EdwardKuenen To see the data correctly with the Prometheus exporter, you also need to update the Prometheus exporter configuration adding resource_to_telemetry_conversion.enabled: true to see the process attributes in prometheus output.

receivers:
  hostmetrics:
    scrapers:
      process:
        metrics:
          process.cpu.utilization:
            enabled: true
        mute_process_name_error: false

exporters:
  prometheus:
    endpoint: 0.0.0.0:1234
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [prometheus]

@github-actions github-actions bot removed the Stale label Sep 12, 2023
@EdwardKuenen
Copy link
Author

Thank you very much @astencel-sumo, that setting did the trick to get the counters!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/hostmetrics
Projects
None yet
Development

No branches or pull requests

5 participants