Skip to content

[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240

wants to merge 1 commit into from

Conversation

celian-garcia
Copy link
Member

@celian-garcia celian-garcia commented Nov 6, 2024

Description

Currently there is a mechanism allowing to split the result by dimension, thanks to a filter param hack.
This is great that all the dimensions are collected as labels in the metrics, but for some resources types it could be unwanted. In cause some concern about cardinality, or continuity of the queries between one version to another (e.g with prometheus exporter, if one does not do a "sum by ..." and the otelcol version is updated, the query can display different results)

To mitigate that I propose to put an optout for this collection so if we want for some resource types and not for others, we can create two separate receivers for example. Or we completely opt out if we don't want additional labels.

Edit:

After a first review, we agreed on the fact that it could do a bit more like allowing us to specify a list of dimensions for a particular metric.
e.g from added documentation

receivers:
  azuremonitor:
    dimensions:
      enabled: true
      overrides:
        "Microsoft.Network/azureFirewalls":
          # Real example of an Azure limitation here:
          # Dimensions exposed are Reason, Status, Protocol,
          # but when selecting Protocol in the filters, it returns nothing.
          # Note here that the metric display name is ``Network rules hit count`` but it's programmatic value is ``NetworkRuleHit``
          # Ref: https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-network-azurefirewalls-metrics
          "NetworkRuleHit": [Reason, Status]

Without this config you won't have the azure_networkrulehit_average_Count metric at all in your results
But with the config we have it with reason and status labels:

# HELP azure_networkrulehit_average_Count
# TYPE azure_networkrulehit_average_Count gauge
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Allow",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 21.875
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Deny",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 10

Link to tracking issue

Fixes #36611

Testing

Documentation

@celian-garcia celian-garcia requested review from codeboten and a team as code owners November 6, 2024 15:29
Copy link

linux-foundation-easycla bot commented Nov 6, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: celian-garcia / name: Célian GARCIA (9709d07)

@github-actions github-actions bot requested a review from nslaughter November 6, 2024 15:29
@celian-garcia celian-garcia changed the title feat: Allow to not split result by dimension [receiver/azuremonitorreceiver] feat: Allow to not split result by dimension Nov 6, 2024
@celian-garcia
Copy link
Member Author

Hey @codeboten, @nslaughter, @jpkrohling. Any opinion on that? I feel it's a no brainer so if you can run the pipeline and have a look, it would be cool!

@tesharp
Copy link

tesharp commented Nov 20, 2024

We have noticed that some metrics do not return any data if you include dimensions that do not have any data, are empty or nil (assumption based on results we have seen). For instance Azure Firewall Network Rule Hits have 3 dimensions Status, Reason and Protocol. If filtering on Status and Reason we get back data, but as soon as you add Protocol it returns no data at all. It returns no data if you only try to filter on Protocol too. Seen same for other metrics that have "empty" dimension values.

Would it maybe make sense to not only opt in / out but set which dimensions you would like to filter on? We would still want Reason and Status but can drop Protocol dimension

@celian-garcia
Copy link
Member Author

We have noticed that some metrics do not return any data if you include dimensions that do not have any data, are empty or nil (assumption based on results we have seen). For instance Azure Firewall Network Rule Hits have 3 dimensions Status, Reason and Protocol. If filtering on Status and Reason we get back data, but as soon as you add Protocol it returns no data at all. It returns no data if you only try to filter on Protocol too. Seen same for other metrics that have "empty" dimension values.

Would it maybe make sense to not only opt in / out but set which dimensions you would like to filter on? We would still want Reason and Status but can drop Protocol dimension

That's interesting. Then for that particular metric, using split_by_dimensions config field would allow you to receive data but you would lose the status and reason granularity. I just checked and indeed we have the same result even in the Azure Portal UI.

Selecting Network rules hit count and Split By = Protocol gives no result.
But for some metrics, like Application rules hits count, you have results split by http/https.
Also the SNAT Port utilization, you have results split by tcp/udp.

I believe this is a bug on the Network rules hit count definition that shouldn't propose protocol actually. I'm not sure that only this reason deserve a hack on our side, but still this feature would be interesting to me. Like it would be interesting to exclude a particular dimensions that would have too big cardinality or that we wouldn't be interested in.

Let me propose a config structure:

dimensions:
  # default to true to not introduce breaking change. This would mean that all the available dimensions will be collected, except if an exclusion exist.
  enabled: true | false 
  exclusions:
    "Microsoft.Network/azureFirewalls": # service name
        "Network rules hit count": # metric name
          - "Protocol"

We can also implement it the other way

dimensions:
  enabled: true 
  overrides:
    "Microsoft.Network/azureFirewalls":
        "Network rules hit count": [Reason, Status]

WDYT?

@tesharp
Copy link

tesharp commented Nov 20, 2024

I also think this is a bug in the api not only affecting this metric. We noticed it yesterday at 11:10 when suddenly all Postgresql flexible databases stoped reporting cpu, memory, storage etc. It had been working before that filtering on ServerName, but suddenly doesn't work anymore.

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

@celian-garcia
Copy link
Member Author

I also think this is a bug in the api not only affecting this metric. We noticed it yesterday at 11:10 when suddenly all Postgresql flexible databases stoped reporting cpu, memory, storage etc. It had been working before that filtering on ServerName, but suddenly doesn't work anymore.

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

Okay I will make an update.

@ahurtaud
Copy link

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

I agree, second proposed config format is better to me

@celian-garcia
Copy link
Member Author

celian-garcia commented Nov 21, 2024

ping @tesharp I finished and pushed and I confirm that before this new overrides, the network rule hit metric was not even in the result ! Now it works well 💪

Note it's probably worth it that I move these dimensions functions out of scraper.go file. As if I make scraper_batch right after, I will reuse them..

Edit: moved to a dedicated dimensions.go files + added some unit tests

@tesharp
Copy link

tesharp commented Nov 22, 2024

Nice :) Looks good.

@MovieStoreGuy
Copy link
Contributor

Please make sure you run the linters :)

@celian-garcia
Copy link
Member Author

Please make sure you run the linters :)

Thanks @MovieStoreGuy, sorry for that. It should be fine now.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Dec 25, 2024
Copy link
Contributor

github-actions bot commented Jan 9, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jan 9, 2025
@github-actions github-actions bot removed the Stale label Jan 10, 2025
@celian-garcia celian-garcia closed this by deleting the head repository Jan 13, 2025
atoulme pushed a commit that referenced this pull request Mar 7, 2025
…mension (#37168)

Recreated from a diffferent fork. Context:
#36240

> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
> Ex. Adding a feature - Explain what this achieves.-->
> #### Description
> 
> Currently there is a mechanism allowing to split the result by
dimension, thanks to a filter param hack.
> This is great that all the dimensions are collected as labels in the
metrics, but for some resources types it could be unwanted. In cause
some concern about cardinality, or continuity of the queries between one
version to another (e.g with prometheus exporter, if one does not do a
"sum by ..." and the otelcol version is updated, the query can display
different results)
> 
> To mitigate that I propose to put an optout for this collection so if
we want for some resource types and not for others, we can create two
separate receivers for example. Or we completely opt out if we don't
want additional labels.
> 
> ### Edit: 
> After a first review, we agreed on the fact that it could do a bit
more like allowing us to specify a list of dimensions for a particular
metric.
> e.g from added documentation
> ```yaml
> receivers:
>   azuremonitor:
>     dimensions:
>       enabled: true
>       overrides:
>         "Microsoft.Network/azureFirewalls":
>           # Real example of an Azure limitation here:
>           # Dimensions exposed are Reason, Status, Protocol,
> # but when selecting Protocol in the filters, it returns nothing.
> # Note here that the metric display name is ``Network rules hit
count`` but it's programmatic value is ``NetworkRuleHit``
> # Ref:
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-network-azurefirewalls-metrics
>           "NetworkRuleHit": [Reason, Status]
> ```
> Without this config you won't have the
``azure_networkrulehit_average_Count`` metric at all in your results
> But with the config we have it with reason and status labels:
> 
> ```# HELP azure_networkrulehit_average_Count``` 
> ```# TYPE azure_networkrulehit_average_Count gauge```
>
```azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Allow",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"}
21.875```
>
```azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Deny",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"}
10```
>  
> <!-- Issue number (e.g. #1234) or full URL to issue, if applicable.
-->
> #### Link to tracking issue
> Fixes #36611 
> 
> <!--Describe what testing was performed and which tests were added.-->
> #### Testing
> 
> <!--Describe the documentation added.-->
> #### Documentation
> 
> <!--Please delete paragraphs that you did not use before
submitting.-->

Signed-off-by: Célian Garcia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension
5 participants