Skip to content

Commit 1fdf89d

Browse files
[receiver/azuremonitorreceiver] feat: option to use azmetrics getBatch API (#38895)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Let me introduce the Batch Metrics API https://learn.microsoft.com/en-us/rest/api/monitor/metrics-batch/batch?view=rest-monitor-2023-10-01&tabs=HTTP The advantages of using ``azmetrics`` batch metrics API over the ``armmonitor`` metrics API are described in details in the issue + in the README.md Note that this is a revival of a previous PR that we made last year, and that has not been followed up properly but we're using that forked code in the Amadeus company for months with good results. <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Relates #38651 <!--Describe what testing was performed and which tests were added.--> #### Testing ✅ Manual tests + mocked unit tests <!--Describe the documentation added.--> #### Documentation ✅ Added config field documentation in README.md <!--Please delete paragraphs that you did not use before submitting.--> Signed-off-by: Célian Garcia <[email protected]>
1 parent d0ffb46 commit 1fdf89d

File tree

15 files changed

+2386
-12
lines changed

15 files changed

+2386
-12
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: enhancement
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: azuremonitorreceiver
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: "Allow to use metrics:getBatch API (Azure Monitor Metrics Data Plane)"
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [38651]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext:
19+
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: [user]

receiver/azuremonitorreceiver/README.md

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ The following settings are optional:
3535
- `cloud` (default = `AzureCloud`): defines which Azure cloud to use. Valid values: `AzureCloud`, `AzureUSGovernment`, `AzureChinaCloud`.
3636
- `dimensions.enabled` (default = `true`): allows to opt out from automatically split by all the dimensions of the resource type.
3737
- `dimensions.overrides` (default = `{}`): if dimensions are enabled, it allows you to specify a set of dimensions for a particular metric. This is a two levels map with first key being the resource type and second key being the metric name. Programmatic value should be used for metric name https://learn.microsoft.com/en-us/azure/azure-monitor/reference/metrics-index
38+
- `use_batch_api` (default = `false`): Use the batch API to fetch metrics. This is useful when the number of subscriptions is high and the API calls are rate limited.
3839

3940
Authenticating using service principal requires following additional settings:
4041

@@ -72,6 +73,27 @@ receivers:
7273
NamespaceCpuUsage: [*] # metric NamespaceCpuUsage with all known aggregations
7374
```
7475
76+
### Use Batch API (experimental)
77+
78+
There's two API to collect metrics in Azure Monitor:
79+
- the **Azure Resource Manager (ARM) API** (currently by default)
80+
- the **Azure Monitor Metrics Data Plane API** (with ``use_batch_api=true``)
81+
82+
The Azure Monitor Metrics Data Plane API present some interesting benefits, especially regarding **rate limits**.
83+
84+
> Some highlights from [announcement blog post - Jan 31, 2024](https://techcommunity.microsoft.com/blog/azureobservabilityblog/azure-monitor--announcing-general-availability-of-azure-monitor-metrics-data-pla/4041394)
85+
> - **Higher Querying Limits**: This API is designed to handle metric data queries for resources with higher query limits compared to existing Azure Resource Manager APIs. This is particularly advantageous for customers with large subscriptions containing numerous resources. While the REST API allows only 12,000 API calls per hour, the Azure Metrics Data Plane API elevates this limit to a staggering 360,000 API calls per hour. This increase in query throughput ensures a more efficient and streamlined experience for customers.
86+
> - **Efficiency**: The efficiency of this API shines when collecting metrics for multiple resources. Instead of making multiple API calls for each resource, the Azure Metrics Data Plane API offers a single batch API call that can accommodate up to 50 resource IDs. This results in higher throughput and more efficient querying, making it a time-saving solution.
87+
> - **Improved Performance**: The performance of client-side services can be greatly enhanced by reducing the number of calls required to extract the same amount of metric data for resources. The Azure Metrics Data Plane API optimizes the data retrieval process, ultimately leading to improved performance across the board.
88+
89+
Good news is that it's **very easy for you to try out!**
90+
```yaml
91+
receivers:
92+
azuremonitor:
93+
use_bath_api: true
94+
... # no change for other configs
95+
```
96+
7597
### Example Configurations
7698

7799
Using [Service Principal](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash#service-principal-with-a-secret) for authentication:
@@ -176,14 +198,25 @@ cardinality: once per sub id
176198
### [Metrics Definitions - List](https://learn.microsoft.com/en-us/rest/api/monitor/metric-definitions/list?view=rest-monitor-2023-10-01&tabs=HTTP)
177199
```yaml
178200
conditions: always
179-
cardinality: once per res id and *page of metrics def
201+
cardinality:
202+
- if use_batch_api is false, once per res id and *page of metrics def
203+
- if use_batch_api is true, once per res type and *page of metrics def
180204
```
181205
182206
### [Metrics - List](https://learn.microsoft.com/en-us/rest/api/monitor/metrics/list?view=rest-monitor-2023-10-01&tabs=HTTP)
183207
```yaml
184-
conditions: always
208+
conditions:
209+
- use_batch_api is false
185210
cardinality: once per res id, *page of metrics, and **composite key
186211
```
212+
213+
### [Metrics Batch - Batch](https://learn.microsoft.com/en-us/rest/api/monitor/metrics-batch/batch?view=rest-monitor-2023-10-01&tabs=HTTP)
214+
```yaml
215+
conditions:
216+
- use_batch_api is true
217+
cardinality: once per res type and **composite key
218+
```
219+
187220
> *page size has not been clearly identified, reading the documentation. Even Chat Bots lose themselves
188221
> with the "top"/"$top" filter that doesn't seem related, and give random results from 10 to 1000...
189222
>

receiver/azuremonitorreceiver/config.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ var (
2525
errMissingSubscriptionIDs = errors.New(`neither "SubscriptionIDs" nor "DiscoverSubscription" is specified in the config`)
2626
errMissingClientID = errors.New(`"ClientID" is not specified in config`)
2727
errMissingClientSecret = errors.New(`"ClientSecret" is not specified in config`)
28-
errMissingFedTokenFile = errors.New(`"FederatedTokenFile"" is not specified in config`)
28+
errMissingFedTokenFile = errors.New(`"FederatedTokenFile" is not specified in config`)
2929
errInvalidCloud = errors.New(`"Cloud" is invalid`)
3030

3131
monitorServices = []string{
@@ -255,6 +255,7 @@ type Config struct {
255255
MaximumNumberOfMetricsInACall int `mapstructure:"maximum_number_of_metrics_in_a_call"`
256256
MaximumNumberOfRecordsPerResource int32 `mapstructure:"maximum_number_of_records_per_resource"`
257257
AppendTagsAsAttributes bool `mapstructure:"append_tags_as_attributes"`
258+
UseBatchAPI bool `mapstructure:"use_batch_api"`
258259
Dimensions DimensionsConfig `mapstructure:"dimensions"`
259260
}
260261

receiver/azuremonitorreceiver/factory.go

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,17 @@ func createMetricsReceiver(_ context.Context, params receiver.Settings, rConf co
5656
return nil, errConfigNotAzureMonitor
5757
}
5858

59-
azureScraper := newScraper(cfg, params)
60-
s, err := scraper.NewMetrics(azureScraper.scrape, scraper.WithStart(azureScraper.start))
59+
var metrics scraper.Metrics
60+
var err error
61+
if cfg.UseBatchAPI {
62+
azureBatchScraper := newBatchScraper(cfg, params)
63+
metrics, err = scraper.NewMetrics(azureBatchScraper.scrape, scraper.WithStart(azureBatchScraper.start))
64+
} else {
65+
azureScraper := newScraper(cfg, params)
66+
metrics, err = scraper.NewMetrics(azureScraper.scrape, scraper.WithStart(azureScraper.start))
67+
}
6168
if err != nil {
6269
return nil, err
6370
}
64-
65-
return scraperhelper.NewMetricsController(&cfg.ControllerConfig, params, consumer, scraperhelper.AddScraper(metadata.Type, s))
71+
return scraperhelper.NewMetricsController(&cfg.ControllerConfig, params, consumer, scraperhelper.AddScraper(metadata.Type, metrics))
6672
}
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
// TODO remove this package in favor of the new arriving feature in Azure SDK for Go.
5+
// Ref: https://github.com/Azure/azure-sdk-for-go/pull/24309
6+
7+
//nolint:unused
8+
package fake // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/azuremonitorreceiver/fake"
9+
10+
import (
11+
"net/http"
12+
"reflect"
13+
"sync"
14+
15+
"github.com/Azure/azure-sdk-for-go/sdk/azcore/fake/server"
16+
)
17+
18+
type nonRetriableError struct {
19+
error
20+
}
21+
22+
func (nonRetriableError) NonRetriable() {
23+
// marker method
24+
}
25+
26+
func contains[T comparable](s []T, v T) bool {
27+
for _, vv := range s {
28+
if vv == v {
29+
return true
30+
}
31+
}
32+
return false
33+
}
34+
35+
func getHeaderValue(h http.Header, k string) string {
36+
v := h[k]
37+
if len(v) == 0 {
38+
return ""
39+
}
40+
return v[0]
41+
}
42+
43+
func getOptional[T any](v T) *T {
44+
if reflect.ValueOf(v).IsZero() {
45+
return nil
46+
}
47+
return &v
48+
}
49+
50+
func parseOptional[T any](v string, parse func(v string) (T, error)) (*T, error) {
51+
if v == "" {
52+
return nil, nil
53+
}
54+
t, err := parse(v)
55+
if err != nil {
56+
return nil, err
57+
}
58+
return &t, err
59+
}
60+
61+
func newTracker[T any]() *tracker[T] {
62+
return &tracker[T]{
63+
items: map[string]*T{},
64+
}
65+
}
66+
67+
type tracker[T any] struct {
68+
items map[string]*T
69+
mu sync.Mutex
70+
}
71+
72+
func (p *tracker[T]) get(req *http.Request) *T {
73+
p.mu.Lock()
74+
defer p.mu.Unlock()
75+
if item, ok := p.items[server.SanitizePagerPollerPath(req.URL.Path)]; ok {
76+
return item
77+
}
78+
return nil
79+
}
80+
81+
func (p *tracker[T]) add(req *http.Request, item *T) {
82+
p.mu.Lock()
83+
defer p.mu.Unlock()
84+
p.items[server.SanitizePagerPollerPath(req.URL.Path)] = item
85+
}
86+
87+
func (p *tracker[T]) remove(req *http.Request) {
88+
p.mu.Lock()
89+
defer p.mu.Unlock()
90+
delete(p.items, server.SanitizePagerPollerPath(req.URL.Path))
91+
}
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
// TODO remove this package in favor of the new arriving feature in Azure SDK for Go.
5+
// Ref: https://github.com/Azure/azure-sdk-for-go/pull/24309
6+
7+
package fake // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/azuremonitorreceiver/fake"
8+
9+
import (
10+
"context"
11+
"errors"
12+
"fmt"
13+
"net/http"
14+
"net/url"
15+
"regexp"
16+
"strconv"
17+
"strings"
18+
19+
azfake "github.com/Azure/azure-sdk-for-go/sdk/azcore/fake"
20+
"github.com/Azure/azure-sdk-for-go/sdk/azcore/fake/server"
21+
"github.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics"
22+
)
23+
24+
// MetricsServer is a fake server for instances of the azmetrics.MetricsClient type.
25+
type MetricsServer struct {
26+
// QueryResources is the fake for method MetricsClient.List
27+
// HTTP status codes to indicate success: http.StatusOK
28+
QueryResources func(ctx context.Context, subscriptionID, metricNamespace string, metricNames []string, resourceIDs azmetrics.ResourceIDList, options *azmetrics.QueryResourcesOptions) (resp azfake.Responder[azmetrics.QueryResourcesResponse], errResp azfake.ErrorResponder)
29+
}
30+
31+
// NewMetricsServerTransport creates a new instance of MetricsServerTransport with the provided implementation.
32+
// The returned MetricsServerTransport instance is connected to an instance of azmetrics.MetricsClient via the
33+
// azcore.ClientOptions.Transporter field in the client's constructor parameters.
34+
func NewMetricsServerTransport(srv *MetricsServer) *MetricsServerTransport {
35+
return &MetricsServerTransport{srv: srv}
36+
}
37+
38+
// MetricsServerTransport connects instances of armmonitor.MetricsClient to instances of MetricsServer.
39+
// Don't use this type directly, use NewMetricsServerTransport instead.
40+
type MetricsServerTransport struct {
41+
srv *MetricsServer
42+
}
43+
44+
// Do implements the policy.Transporter interface for MetricsServerTransport.
45+
func (m *MetricsServerTransport) Do(req *http.Request) (*http.Response, error) {
46+
// rawMethod := req.Context().Value(runtime.CtxAPINameKey{})
47+
// method, ok := rawMethod.(string)
48+
// if !ok {
49+
// return nil, nonRetriableError{errors.New("unable to dispatch request, missing value for CtxAPINameKey")}
50+
// }
51+
//
52+
// var resp *http.Response
53+
// var err error
54+
//
55+
// switch method {
56+
// case "Client.QueryResources":
57+
// resp, err = m.dispatchQueryResources(req)
58+
// default:
59+
// err = fmt.Errorf("unhandled API %s", method)
60+
// }
61+
//
62+
// if err != nil {
63+
// return nil, err
64+
// }
65+
//
66+
// return resp, nil
67+
68+
// We can't directly reuse the logic as the runtime.CtxAPINameKey is not send by the client.
69+
return m.dispatchQueryResources(req)
70+
}
71+
72+
func (m *MetricsServerTransport) dispatchQueryResources(req *http.Request) (*http.Response, error) {
73+
if m.srv.QueryResources == nil {
74+
return nil, &nonRetriableError{errors.New("fake for method QueryResources not implemented")}
75+
}
76+
77+
const regexStr = `/subscriptions/(?P<subscriptionId>[!#&$-;=?-\[\]_a-zA-Z0-9~%@]+)/metrics:getBatch`
78+
regex := regexp.MustCompile(regexStr)
79+
matches := regex.FindStringSubmatch(req.URL.EscapedPath())
80+
if len(matches) < 1 {
81+
return nil, fmt.Errorf("failed to parse path %s", req.URL.Path)
82+
}
83+
qp := req.URL.Query()
84+
body, err := server.UnmarshalRequestAsJSON[azmetrics.ResourceIDList](req)
85+
if err != nil {
86+
return nil, err
87+
}
88+
subscriptionIDParam, err := url.PathUnescape(matches[regex.SubexpIndex("subscriptionId")])
89+
if err != nil {
90+
return nil, err
91+
}
92+
startTimeUnescaped, err := url.QueryUnescape(qp.Get("startTime"))
93+
if err != nil {
94+
return nil, err
95+
}
96+
startTimeParam := getOptional(startTimeUnescaped)
97+
endTimeUnescaped, err := url.QueryUnescape(qp.Get("endTime"))
98+
if err != nil {
99+
return nil, err
100+
}
101+
endTimeParam := getOptional(endTimeUnescaped)
102+
intervalUnescaped, err := url.QueryUnescape(qp.Get("interval"))
103+
if err != nil {
104+
return nil, err
105+
}
106+
intervalParam := getOptional(intervalUnescaped)
107+
metricnamesUnescaped, err := url.QueryUnescape(qp.Get("metricnames"))
108+
if err != nil {
109+
return nil, err
110+
}
111+
aggregationUnescaped, err := url.QueryUnescape(qp.Get("aggregation"))
112+
if err != nil {
113+
return nil, err
114+
}
115+
aggregationParam := getOptional(aggregationUnescaped)
116+
topUnescaped, err := url.QueryUnescape(qp.Get("top"))
117+
if err != nil {
118+
return nil, err
119+
}
120+
topParam, err := parseOptional(topUnescaped, func(v string) (int32, error) {
121+
p, parseErr := strconv.ParseInt(v, 10, 32)
122+
if parseErr != nil {
123+
return 0, parseErr
124+
}
125+
return int32(p), nil
126+
})
127+
if err != nil {
128+
return nil, err
129+
}
130+
orderbyUnescaped, err := url.QueryUnescape(qp.Get("orderby"))
131+
if err != nil {
132+
return nil, err
133+
}
134+
orderbyParam := getOptional(orderbyUnescaped)
135+
rollupbyUnescaped, err := url.QueryUnescape(qp.Get("rollupby"))
136+
if err != nil {
137+
return nil, err
138+
}
139+
rollupbybyParam := getOptional(rollupbyUnescaped)
140+
filterUnescaped, err := url.QueryUnescape(qp.Get("filter"))
141+
if err != nil {
142+
return nil, err
143+
}
144+
filterParam := getOptional(filterUnescaped)
145+
metricnamespaceUnescaped, err := url.QueryUnescape(qp.Get("metricnamespace"))
146+
if err != nil {
147+
return nil, err
148+
}
149+
var options *azmetrics.QueryResourcesOptions
150+
if startTimeParam != nil || endTimeParam != nil || intervalParam != nil || aggregationParam != nil || topParam != nil || orderbyParam != nil || rollupbybyParam != nil || filterParam != nil {
151+
options = &azmetrics.QueryResourcesOptions{
152+
StartTime: startTimeParam,
153+
EndTime: endTimeParam,
154+
Interval: intervalParam,
155+
Aggregation: aggregationParam,
156+
Top: topParam,
157+
OrderBy: orderbyParam,
158+
RollUpBy: rollupbybyParam,
159+
Filter: filterParam,
160+
}
161+
}
162+
respr, errRespr := m.srv.QueryResources(req.Context(), subscriptionIDParam, metricnamespaceUnescaped, strings.Split(metricnamesUnescaped, ","), body, options)
163+
if respErr := server.GetError(errRespr, req); respErr != nil {
164+
return nil, respErr
165+
}
166+
respContent := server.GetResponseContent(respr)
167+
if !contains([]int{http.StatusOK}, respContent.HTTPStatus) {
168+
return nil, &nonRetriableError{fmt.Errorf("unexpected status code %d. acceptable values are http.StatusOK", respContent.HTTPStatus)}
169+
}
170+
resp, err := server.MarshalResponseAsJSON(respContent, server.GetResponse(respr), req)
171+
if err != nil {
172+
return nil, err
173+
}
174+
return resp, nil
175+
}

0 commit comments

Comments
 (0)