-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[receiver/hostmetrics/cpuscraper] Windows - CTX timeout, use CountsWithContext instead and make it configurable #32133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This PR "does not" fix the main ask here, but improves and avoid the issue when the metric is not enabled |
…etric is enabled (#32173) **Description:** As described in #32133 , in windows, the CPU count results of a wmi call with a hardcoded context timeout of 3 seconds. This leads to an error when the wmi is slow or system is under heavy load, causing all the collected metrics to not be emitted. The CPU count metrics, logical and physical, are not enabled by default and there is no reason to calculate it unless it's enabled. **Link to tracking Issue:** #32133 **Testing:** unit test has been validated **Documentation:** <Describe the documentation added.> Signed-off-by: Dani Louca <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
Removing |
Is this only an issue for windows or also Linux? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Since the bug is in a WMI call, it looks like it's Windows only. |
Another user hitting this issue, can we look into this? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Uh oh!
There was an error while loading. Please reload this page.
Component(s)
receiver/hostmetrics
What happened?
Description
The
cpu.Counts
gopsutil func, which is called by the cpu scraper, does not set a deadline/timeout on its context, which forces WMIQueryWithContext to set it using the hardcoded timeout value of 3 seconds.In large busy env or/and low resourced, the wmi call can take longer than 3 seconds, which will lead to a context deadline exceeded error and fail to get the CPU counts.
Steps to Reproduce
Find a windows host where the wmi calls take longer than 3 seconds and run the hostmetrics receiver with the cpu scraper.
Expected Result
Get all the metrics, including the physical and logical CPU counts
Actual Result
CPU counts are missing and we see this error in the logs
Collector version
v0.95.0
Environment information
Environment
OpenTelemetry Collector configuration
No response
Log output
Additional context
Suggestion is to use
CountsWithContext
instead ofCounts
and introduce a wmi_timeout option for cpuscrapercc: @atoulme who helped with the RCA
The text was updated successfully, but these errors were encountered: