fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342

an-mmx · 2025-01-20T14:56:37Z

Description

The current implementation of tracking for the last metrics fetch is highly sensitive to time, relying on time.Now(). This sensitivity causes the loss of data points due to jitter in 1-minute intervals. I recommend a change request to implement a more robust comparison with the last minute of metrics fetch.

Link to tracking issue

Fixes #37337

Testing

A new coverage introduced in unit test

Documentation

No documentation added

github-actions · 2025-02-06T05:22:22Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

an-mmx · 2025-02-06T15:42:46Z

/ping

celian-garcia · 2025-02-06T17:43:34Z

That's very interesting. Did you observe the same thing with other timegrain?
I feel like your fix would fix only the PT1M but not others.

We have a fork (using batch metric API instead of metric API in order to resolve a problem of rate limitation, anyway..) and in that fork we did something different. We remove 4 times the duration of the timegrain.
So for a PT1M, we take 4 * 60 = 240 seconds in the past. This to ensure that we'll receive some data as if we take only 60sec back, we sometimes have nothing, and nothing, and nothing. Of course it's not the exactly the same API but maybe the problem is the same.

So in the end, we're very interested in having some figures with before/after. And I think I will try your implementation with our fork.

an-mmx · 2025-02-10T14:59:38Z

.. Did you observe the same thing with other timegrain? I feel like your fix would fix only the PT1M but not others.

Following the scrapper logic and our changes, it should work well. The root cause of the issue was in a timedrift caused by various factors (like requests duration and processing time) and the way it was compared. This change mitigates impact of the timedrift.
Could you please share your concerns regarding other timegrains?

We have a fork (using batch metric API instead of metric API in order to resolve a problem of rate limitation, anyway..) and in that fork we did something different. We remove 4 times the duration of the timegrain. So for a PT1M, we take 4 * 60 = 240 seconds in the past. This to ensure that we'll receive some data as if we take only 60sec back, we sometimes have nothing, and nothing, and nothing. Of course it's not the exactly the same API but maybe the problem is the same.

Have you checked for these gaps in Azure App Insights? Azure doesn't store null-value data points, it might be the case: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-troubleshoot#chart-shows-dashed-line

So in the end, we're very interested in having some figures with before/after. And I think I will try your implementation with our fork.

Sure.
We collect Azure EventHub IncomingMessages metric. Here you can see a line, representing a count of data points per 1 hour collected before the fix, filtered by a single unique dimension. We can observe that we collect less than 60 datapoints per hour, and this isn't something we expect.

The next screenshot have taken after the fix. It is build on the same query. We can observe that we have reached 60 datapoints per hour. We can also observe some "spikes". These spikes is a result of otel-collector restarts. However, it isn't a problem for us.

celian-garcia · 2025-02-13T14:00:33Z

Well ok, I checked twice and our fork using getBatch metrics API is completely different in terms of interface. We make a batchRequest giving start and end time and the classic metrics API doesn't even have this notion actually.
This is just to take the metrics at a T time. So what you did is perfect and seems indeed to largely mitigates the issue.

LGTM in the end, I would like to approve, but I'm owner of this code since only a week and I'm not sure how can I approve it :D

an-mmx · 2025-02-14T13:40:09Z

Usch, I've rebased w/ main and lots of test got failed :/

celian-garcia

Hoping that it has some weight here, I'm approving these changes.
🙏
I've been assign to CODEOWNERS after the PR has been created :)

…gaps in data points for PT1M time grains

crobert-1 · 2025-02-14T17:09:59Z

Hoping that it has some weight here, I'm approving these changes.

@celian-garcia: You're a code owner now, so it definitely counts. Really appreciate your input here 🙂

receiver/azuremonitorreceiver/scraper.go

.chloggen/fix_azure-monitor-receiver-time-grain.yaml

Co-authored-by: Curtis Robert <[email protected]>

* main: (111 commits) fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains (open-telemetry#37342) fix(deps): update module sigs.k8s.io/controller-runtime to v0.20.2 (open-telemetry#37996) fix(deps): update module github.com/hashicorp/consul/api to v1.31.2 (open-telemetry#38031) [processor/resourcedetection] add instructions for recommended use of the dynatrace detector (open-telemetry#37962) fix(deps): update module github.com/go-sql-driver/mysql to v1.9.0 (open-telemetry#38007) fix(deps): update module google.golang.org/api to v0.221.0 (open-telemetry#38027) prometheusreceiver: deprecate start time adjustment (open-telemetry#37879) fix(deps): update module github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common to v1.0.1100 (open-telemetry#37995) chore(deps): update golang docker tag to v1.24 (open-telemetry#37997) fix(deps): update all github.com/aws packages (open-telemetry#37983) chore(deps): update prom/prometheus docker tag to v3.2.0 (open-telemetry#37998) fix(deps): update kubernetes packages to v0.32.2 (open-telemetry#38004) fix(deps): update module github.com/clickhouse/clickhouse-go/v2 to v2.32.1 (open-telemetry#38006) fix(deps): update module github.com/google/go-github/v69 to v69.2.0 (open-telemetry#38014) fix(deps): update module github.com/sap/go-hdb to v1.13.3 (open-telemetry#38021) fix(deps): update module go.etcd.io/bbolt to v1.4.0 (open-telemetry#38024) [exporter/stefexporter] Fix a context cancellation bug in STEF exporter (open-telemetry#37944) fix(deps): update module github.com/spf13/cobra to v1.9.1 (open-telemetry#38023) fix(deps): update module github.com/envoyproxy/go-control-plane/envoy to v1.32.4 (open-telemetry#37990) fix(deps): update module github.com/hashicorp/consul/api to v1.31.1 (open-telemetry#37991) ...

an-mmx requested a review from a team as a code owner January 20, 2025 14:56

an-mmx requested a review from bogdandrutu January 20, 2025 14:56

github-actions bot assigned crobert-1 Jan 20, 2025

github-actions bot added the receiver/azuremonitor label Jan 20, 2025

github-actions bot requested a review from nslaughter January 20, 2025 14:56

an-mmx force-pushed the fix/azure-monitor-receiver-time-grain branch 6 times, most recently from ee3a095 to 9783d0e Compare January 22, 2025 19:10

github-actions bot added the Stale label Feb 6, 2025

github-actions bot removed the Stale label Feb 7, 2025

an-mmx force-pushed the fix/azure-monitor-receiver-time-grain branch from 9783d0e to 58fe5a0 Compare February 14, 2025 13:29

github-actions bot requested a review from celian-garcia February 14, 2025 13:29

celian-garcia approved these changes Feb 14, 2025

View reviewed changes

fix(azuremonitorreceiver): Azure Monitor receiver should not produce …

562fe13

…gaps in data points for PT1M time grains

an-mmx force-pushed the fix/azure-monitor-receiver-time-grain branch from 58fe5a0 to 562fe13 Compare February 14, 2025 15:14

Merge branch 'main' into fix/azure-monitor-receiver-time-grain

5110ebb

crobert-1 reviewed Feb 14, 2025

View reviewed changes

receiver/azuremonitorreceiver/scraper.go Show resolved Hide resolved

crobert-1 reviewed Feb 17, 2025

View reviewed changes

.chloggen/fix_azure-monitor-receiver-time-grain.yaml Outdated Show resolved Hide resolved

Update .chloggen/fix_azure-monitor-receiver-time-grain.yaml

c991ecc

Co-authored-by: Curtis Robert <[email protected]>

crobert-1 approved these changes Feb 17, 2025

View reviewed changes

crobert-1 added the ready to merge Code review completed; ready to merge by maintainers label Feb 17, 2025

github-actions bot mentioned this pull request Feb 18, 2025

Weekly Report: 2025-02-11 - 2025-02-18 #37985

Closed

songy23 merged commit 848f0c0 into open-telemetry:main Feb 18, 2025
173 checks passed

github-actions bot added this to the next release milestone Feb 18, 2025

an-mmx deleted the fix/azure-monitor-receiver-time-grain branch February 18, 2025 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342

fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342

Uh oh!

an-mmx commented Jan 20, 2025 •

edited by crobert-1

Loading

Uh oh!

github-actions bot commented Feb 6, 2025

Uh oh!

an-mmx commented Feb 6, 2025

Uh oh!

celian-garcia commented Feb 6, 2025

Uh oh!

an-mmx commented Feb 10, 2025

Uh oh!

celian-garcia commented Feb 13, 2025

Uh oh!

an-mmx commented Feb 14, 2025

Uh oh!

celian-garcia left a comment •

edited

Loading

Uh oh!

crobert-1 commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342

fix(azuremonitorreceiver): Azure Monitor receiver should not produce gaps in data points for PT1M time grains #37342

Uh oh!

Conversation

an-mmx commented Jan 20, 2025 • edited by crobert-1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Link to tracking issue

Testing

Documentation

Uh oh!

github-actions bot commented Feb 6, 2025

Uh oh!

an-mmx commented Feb 6, 2025

Uh oh!

celian-garcia commented Feb 6, 2025

Uh oh!

an-mmx commented Feb 10, 2025

Uh oh!

celian-garcia commented Feb 13, 2025

Uh oh!

an-mmx commented Feb 14, 2025

Uh oh!

celian-garcia left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crobert-1 commented Feb 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

an-mmx commented Jan 20, 2025 •

edited by crobert-1

Loading

celian-garcia left a comment •

edited

Loading