-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342
Conversation
ee3a095
to
9783d0e
Compare
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
/ping |
That's very interesting. Did you observe the same thing with other timegrain? We have a fork (using batch metric API instead of metric API in order to resolve a problem of rate limitation, anyway..) and in that fork we did something different. We remove 4 times the duration of the timegrain. So in the end, we're very interested in having some figures with before/after. And I think I will try your implementation with our fork. |
Following the scrapper logic and our changes, it should work well. The root cause of the issue was in a timedrift caused by various factors (like requests duration and processing time) and the way it was compared. This change mitigates impact of the timedrift.
Have you checked for these gaps in Azure App Insights? Azure doesn't store null-value data points, it might be the case: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-troubleshoot#chart-shows-dashed-line
Sure. The next screenshot have taken after the fix. It is build on the same query. We can observe that we have reached 60 datapoints per hour. We can also observe some "spikes". These spikes is a result of otel-collector restarts. However, it isn't a problem for us. |
Well ok, I checked twice and our fork using getBatch metrics API is completely different in terms of interface. We make a batchRequest giving start and end time and the classic metrics API doesn't even have this notion actually. LGTM in the end, I would like to approve, but I'm owner of this code since only a week and I'm not sure how can I approve it :D |
9783d0e
to
58fe5a0
Compare
Usch, I've rebased w/ main and lots of test got failed :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping that it has some weight here, I'm approving these changes.
🙏
I've been assign to CODEOWNERS after the PR has been created :)
…gaps in data points for PT1M time grains
58fe5a0
to
562fe13
Compare
@celian-garcia: You're a code owner now, so it definitely counts. Really appreciate your input here 🙂 |
Co-authored-by: Curtis Robert <[email protected]>
* main: (111 commits) fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains (open-telemetry#37342) fix(deps): update module sigs.k8s.io/controller-runtime to v0.20.2 (open-telemetry#37996) fix(deps): update module github.com/hashicorp/consul/api to v1.31.2 (open-telemetry#38031) [processor/resourcedetection] add instructions for recommended use of the dynatrace detector (open-telemetry#37962) fix(deps): update module github.com/go-sql-driver/mysql to v1.9.0 (open-telemetry#38007) fix(deps): update module google.golang.org/api to v0.221.0 (open-telemetry#38027) prometheusreceiver: deprecate start time adjustment (open-telemetry#37879) fix(deps): update module github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common to v1.0.1100 (open-telemetry#37995) chore(deps): update golang docker tag to v1.24 (open-telemetry#37997) fix(deps): update all github.com/aws packages (open-telemetry#37983) chore(deps): update prom/prometheus docker tag to v3.2.0 (open-telemetry#37998) fix(deps): update kubernetes packages to v0.32.2 (open-telemetry#38004) fix(deps): update module github.com/clickhouse/clickhouse-go/v2 to v2.32.1 (open-telemetry#38006) fix(deps): update module github.com/google/go-github/v69 to v69.2.0 (open-telemetry#38014) fix(deps): update module github.com/sap/go-hdb to v1.13.3 (open-telemetry#38021) fix(deps): update module go.etcd.io/bbolt to v1.4.0 (open-telemetry#38024) [exporter/stefexporter] Fix a context cancellation bug in STEF exporter (open-telemetry#37944) fix(deps): update module github.com/spf13/cobra to v1.9.1 (open-telemetry#38023) fix(deps): update module github.com/envoyproxy/go-control-plane/envoy to v1.32.4 (open-telemetry#37990) fix(deps): update module github.com/hashicorp/consul/api to v1.31.1 (open-telemetry#37991) ...
* main: (111 commits) fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains (open-telemetry#37342) fix(deps): update module sigs.k8s.io/controller-runtime to v0.20.2 (open-telemetry#37996) fix(deps): update module github.com/hashicorp/consul/api to v1.31.2 (open-telemetry#38031) [processor/resourcedetection] add instructions for recommended use of the dynatrace detector (open-telemetry#37962) fix(deps): update module github.com/go-sql-driver/mysql to v1.9.0 (open-telemetry#38007) fix(deps): update module google.golang.org/api to v0.221.0 (open-telemetry#38027) prometheusreceiver: deprecate start time adjustment (open-telemetry#37879) fix(deps): update module github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common to v1.0.1100 (open-telemetry#37995) chore(deps): update golang docker tag to v1.24 (open-telemetry#37997) fix(deps): update all github.com/aws packages (open-telemetry#37983) chore(deps): update prom/prometheus docker tag to v3.2.0 (open-telemetry#37998) fix(deps): update kubernetes packages to v0.32.2 (open-telemetry#38004) fix(deps): update module github.com/clickhouse/clickhouse-go/v2 to v2.32.1 (open-telemetry#38006) fix(deps): update module github.com/google/go-github/v69 to v69.2.0 (open-telemetry#38014) fix(deps): update module github.com/sap/go-hdb to v1.13.3 (open-telemetry#38021) fix(deps): update module go.etcd.io/bbolt to v1.4.0 (open-telemetry#38024) [exporter/stefexporter] Fix a context cancellation bug in STEF exporter (open-telemetry#37944) fix(deps): update module github.com/spf13/cobra to v1.9.1 (open-telemetry#38023) fix(deps): update module github.com/envoyproxy/go-control-plane/envoy to v1.32.4 (open-telemetry#37990) fix(deps): update module github.com/hashicorp/consul/api to v1.31.1 (open-telemetry#37991) ...
Description
The current implementation of tracking for the last metrics fetch is highly sensitive to time, relying on
time.Now()
. This sensitivity causes the loss of data points due to jitter in 1-minute intervals. I recommend a change request to implement a more robust comparison with the last minute of metrics fetch.Link to tracking issue
Fixes #37337
Testing
A new coverage introduced in unit test
Documentation
No documentation added