Skip to content

[receiver/hostmetrics] Use native API instead of WMI to get process.handles on Windows #38886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .chloggen/optimizing-process-handles-metric-on-windows.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: hostmetricsreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Reduced the CPU cost of collecting the `process.handles` metric on Windows.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [38886]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Instead of using WMI to retrieve the number of opened handles by each process
the scraper now uses the GetProcessHandleCount Win32 API which results in
reduced CPU usage when the metric `process.handles` is enabled.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
2 changes: 1 addition & 1 deletion receiver/hostmetricsreceiver/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ require (
github.com/shirou/gopsutil/v4 v4.25.2
github.com/stretchr/testify v1.10.0
github.com/tilinna/clock v1.1.0
github.com/yusufpapurcu/wmi v1.2.4
go.opentelemetry.io/collector/component v1.28.2-0.20250319144947-41a9ea7f7402
go.opentelemetry.io/collector/component/componenttest v0.122.2-0.20250319144947-41a9ea7f7402
go.opentelemetry.io/collector/confmap v1.28.2-0.20250319144947-41a9ea7f7402
Expand Down Expand Up @@ -95,6 +94,7 @@ require (
github.com/testcontainers/testcontainers-go v0.35.0 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/collector/consumer/consumererror v0.122.2-0.20250319144947-41a9ea7f7402 // indirect
go.opentelemetry.io/collector/consumer/xconsumer v0.122.2-0.20250319144947-41a9ea7f7402 // indirect
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Number of disk operations performed by the process.

### process.handles

Number of handles held by the process.
Number of open handles held by the process.

This metric is only available on Windows.

Expand All @@ -132,7 +132,7 @@ Percentage of total physical memory that is used by the process.

Number of file descriptors in use by the process.

This metric is only available on Linux.
On Windows this metric captures the number of open handles currently held by the process. If you want to capture this data on Windows use the `process.handles` metric instead to avoid any confusion.

| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:build !windows

package processscraper // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper"

import (
"context"
"errors"
)

const handleCountMetricsLen = 0

var ErrHandlesPlatformSupport = errors.New("process handle collection is only supported on Windows")

func (p *wrappedProcessHandle) GetProcessHandleCountWithContext(_ context.Context) (int64, error) {
return 0, ErrHandlesPlatformSupport
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:build windows

package processscraper // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper"

import (
"context"
)

const handleCountMetricsLen = 1

func (p *wrappedProcessHandle) GetProcessHandleCountWithContext(ctx context.Context) (int64, error) {
// On Windows NumFDsWithContext returns the number of open handles, since it uses the
// GetProcessHandleCount API.
fds, err := p.Process.NumFDsWithContext(ctx)
return int64(fds), err
}

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,10 @@ metrics:
process.open_file_descriptors:
enabled: false
description: Number of file descriptors in use by the process.
extended_documentation: This metric is only available on Linux.
extended_documentation: >-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is kind of in flux in Semantic Conventions (see open-telemetry/semantic-conventions#1798) and will likely be designated Unix-exclusive for reasons stated in this comment. We're still in the process of figuring out how we'll handle all the semconv transition, but we like to get ahead of it where we can.

As a result, I think this metric should remain Linux-exclusive as it was already, and produce an error if enabled on Windows. I think this used to be the behaviour, but it inadvertently changed on a gopsutil upgrade when this was implemented on Windows upstream. I actually forgot this was done (and was kind of hoping they wouldn't do it this way I wanted them to introduce a different API for it but the codebase wasn't structured to allow for that at the time).

I think the best thing to do here would be to introduce some validation checks in the process scraper config to fail if enabling open_file_descriptors on Windows, or process.handles on Linux. However, this would be a breaking change; the previous behaviour is that the scrape would fail every time trying to write those metrics. Maybe I'll follow this up with a featuregated move to add validation for these metrics instead of allowing them and failing them every scrape.

Sorry for the long comment, but to summarize: I think we should change this to erroring on every scrape if process.open_file_descriptors is enabled on Linux, similar to process.handles. It can produce a platform not supported error similar to the handles one.

Copy link
Contributor Author

@pjanotti pjanotti Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @braydonk - I agree open_file_descriptors should error on Windows, as process.handles should error on non-Windows. I tend to think that reporting the error once at startup is better than on every scrape, but, this is something that we can discuss later.

I liked your suggestion for the names:

process.unix.file_descriptor.count
process.windows.handle.count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I subscribed to the spec issue, as soon as it is settled, let's move to implement it.

On Windows this metric captures the number of open handles currently held
by the process. If you want to capture this data on Windows use the
`process.handles` metric instead to avoid any confusion.
unit: '{count}'
sum:
value_type: int
Expand All @@ -179,7 +182,7 @@ metrics:

process.handles:
enabled: false
description: Number of handles held by the process.
description: Number of open handles held by the process.
extended_documentation: This metric is only available on Windows.
unit: '{count}'
sum:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ type processHandle interface {
PageFaultsWithContext(context.Context) (*process.PageFaultsStat, error)
NumCtxSwitchesWithContext(context.Context) (*process.NumCtxSwitchesStat, error)
NumFDsWithContext(context.Context) (int32, error)
GetProcessHandleCountWithContext(context.Context) (int64, error)
// If gatherUsed is true, the currently used value will be gathered and added to the resulting RlimitStat.
RlimitUsageWithContext(ctx context.Context, gatherUsed bool) ([]process.RlimitStat, error)
CgroupWithContext(ctx context.Context) (string, error)
Expand Down
Loading
Loading