Skip to content

Commit 034731d

Browse files
authored
[Monitor OpenTelemetry Exporter] Update Customer SDK Stats Drop Reason (#35599)
### Packages impacted by this PR @azure/monitor-opentelemetry-exporter ### Describe the problem that is addressed by this PR This pull request refines how drop and retry reasons are tracked and reported for customer SDK Stats in the Azure Monitor OpenTelemetry exporter. The main focus is on improving the clarity and categorization of drop/retry reasons, especially for client exceptions, and updating related enums, method signatures, and tests to support these changes. ### Drop/Retry Reason Categorization and API Updates * Enhanced drop and retry reason tracking by introducing the `ExceptionType` enum, allowing explicit and well-known categorization (such as "Network exception", "Timeout exception", etc.) for client exceptions, rather than relying on raw exception messages. Method signatures for `countDroppedItems` and `countRetryItems` now accept an optional `exceptionType` parameter for better specificity. [[1]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1L12-R12) [[2]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1L264-R271) [[3]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1R305-R322) [[4]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1R465-R482) [[5]](diffhunk://#diff-88b62eef1c76c2db9a03f6c670881c41bab6de71bbb9c3fceeb1c33f0c9a455dR225-R235) * Updated the logic for generating drop/retry reason strings to use more descriptive and consistent values (e.g., "Bad request", "Internal server error", "Client exception") and removed ambiguous or redundant drop/retry codes from the `DropCode` and `RetryCode` enums. [[1]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1L323-R373) [[2]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1L381-R414) [[3]](diffhunk://#diff-2f4066f1034ba4adc1b1010a2db050dd609f6b2f1d5935ac9e0574d293c148f1L479-R496) [[4]](diffhunk://#diff-88b62eef1c76c2db9a03f6c670881c41bab6de71bbb9c3fceeb1c33f0c9a455dL185-L195) ### Integration and Usage Improvements * Refactored internal usage to pass the appropriate `ExceptionType` when recording dropped or retried items, ensuring that reasons are categorized correctly in all relevant code paths, including persistent storage and sender logic. [[1]](diffhunk://#diff-3614c3b3d4cc1da2e731ad02fe179b1a84a8d0dcfd9be0baaa5140063aa91305L201-R207) [[2]](diffhunk://#diff-3614c3b3d4cc1da2e731ad02fe179b1a84a8d0dcfd9be0baaa5140063aa91305R241) [[3]](diffhunk://#diff-3614c3b3d4cc1da2e731ad02fe179b1a84a8d0dcfd9be0baaa5140063aa91305R278) [[4]](diffhunk://#diff-80933cb94c81fd07a551d1a848a4bd43ff16ab274bdf75fb5e92567a76a2bbcdR199) * Updated tests to verify the new reason categorization, ensuring that drop/retry reasons are mapped to the correct well-known categories and that exception messages are only used for client exceptions. Test descriptions and assertions have been clarified to reflect the new behavior. [[1]](diffhunk://#diff-e92bd4b395d01c03dd800ccdfb0053a70100c18c216dcde1d6789be817018427R744) [[2]](diffhunk://#diff-e92bd4b395d01c03dd800ccdfb0053a70100c18c216dcde1d6789be817018427R775-R780) [[3]](diffhunk://#diff-e92bd4b395d01c03dd800ccdfb0053a70100c18c216dcde1d6789be817018427L799-R802) [[4]](diffhunk://#diff-d1341d0902c02c5da1fa738a22afc4b7f186367d0e29fff1db008c4ec9c05e68L99-R100) [[5]](diffhunk://#diff-d1341d0902c02c5da1fa738a22afc4b7f186367d0e29fff1db008c4ec9c05e68L119-R128) ### Documentation * Updated the changelog to reflect the renaming of the feature and the changes to drop.reason values for customer SDK Stats. **Documentation:** * Updated the changelog to note the change to drop.reason values for customer SDK Stats. ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [x] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [x] Added a changelog (if necessary)
1 parent d29e264 commit 034731d

File tree

7 files changed

+174
-151
lines changed

7 files changed

+174
-151
lines changed

sdk/monitor/monitor-opentelemetry-exporter/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
### Other Changes
1212

1313
- Renamed Customer Statsbeat feature to customer SDK Stats.
14+
- Update drop.reason values for customer SDK Stats.
1415
- Update logic setting ai.location.ip to use the microsoft.client.ip value by default.
1516

1617
## 1.0.0-beta.33 (2025-08-04)

sdk/monitor/monitor-opentelemetry-exporter/src/export/statsbeat/customerSDKStats.ts

Lines changed: 66 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import type { AzureMonitorExporterOptions } from "../../index.js";
99
import * as ai from "../../utils/constants/applicationinsights.js";
1010
import { StatsbeatMetrics } from "./statsbeatMetrics.js";
1111
import type { CustomerSDKStatsProperties, StatsbeatOptions } from "./types.js";
12-
import { CustomerSDKStats, DropCode, RetryCode } from "./types.js";
12+
import { CustomerSDKStats, DropCode, RetryCode, ExceptionType, DropReason } from "./types.js";
1313
import { CustomSDKStatsCounter, STATSBEAT_LANGUAGE, TelemetryType } from "./types.js";
1414
import { getAttachType } from "../../utils/metricUtils.js";
1515
import { AzureMonitorStatsbeatExporter } from "./statsbeatExporter.js";
@@ -282,11 +282,13 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
282282
* @param envelopes - Array of envelopes dropped
283283
* @param dropCode - The drop code indicating the reason for drop
284284
* @param exceptionMessage - Optional exception message when dropCode is CLIENT_EXCEPTION
285+
* @param exceptionType - Optional explicit exception type override when dropCode is CLIENT_EXCEPTION
285286
*/
286287
public countDroppedItems(
287288
envelopes: Envelope[],
288289
dropCode: DropCode | number,
289290
exceptionMessage?: string,
291+
exceptionType?: ExceptionType,
290292
): void {
291293
const counter: CustomerSDKStats = this.customerSDKStatsCounter;
292294
let telemetry_type: TelemetryType;
@@ -308,7 +310,7 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
308310
}
309311

310312
// Generate a low-cardinality, informative reason description
311-
const reason = this.getDropReason(dropCode, exceptionMessage);
313+
const reason = this.getDropReason(dropCode, exceptionMessage, exceptionType);
312314

313315
// Get or create the success map for this reason
314316
let successMap = reasonMap.get(reason);
@@ -335,15 +337,24 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
335337
* Generates a low-cardinality, informative description for drop reasons
336338
* @param dropCode - The drop code (enum value or status code number)
337339
* @param exceptionMessage - Optional exception message for CLIENT_EXCEPTION
340+
* @param exceptionType - Optional explicit exception type override for CLIENT_EXCEPTION
338341
* @returns A descriptive reason string with low cardinality
339342
*/
340-
private getDropReason(dropCode: DropCode | number, exceptionMessage?: string): string {
343+
private getDropReason(
344+
dropCode: DropCode | number,
345+
exceptionMessage?: string,
346+
exceptionType?: ExceptionType,
347+
): string {
341348
if (dropCode === DropCode.CLIENT_EXCEPTION) {
342-
// For client exceptions, derive a low-cardinality reason from the exception message
349+
// If an explicit exception type is provided, use it
350+
if (exceptionType) {
351+
return exceptionType;
352+
}
353+
// For client exceptions, derive a well-known exception category from the exception message
343354
if (exceptionMessage) {
344355
return this.categorizeExceptionMessage(exceptionMessage);
345356
}
346-
return "unknown_exception";
357+
return ExceptionType.CLIENT_EXCEPTION; // Default to "Client exception" if no message provided
347358
}
348359

349360
// Handle status code drop codes (numeric values)
@@ -353,54 +364,46 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
353364

354365
// Handle other enum drop codes
355366
switch (dropCode) {
356-
case DropCode.CLIENT_EXPIRED_DATA:
357-
return "expired_data";
358367
case DropCode.CLIENT_READONLY:
359-
return "readonly_mode";
360-
case DropCode.CLIENT_STALE_DATA:
361-
return "stale_data";
368+
return DropReason.CLIENT_READONLY;
362369
case DropCode.CLIENT_PERSISTENCE_CAPACITY:
363-
return "persistence_full";
364-
case DropCode.NON_RETRYABLE_STATUS_CODE:
365-
return "non_retryable_status";
370+
return DropReason.CLIENT_PERSISTENCE_CAPACITY;
366371
case DropCode.UNKNOWN:
367372
default:
368-
return "unknown_reason";
373+
return DropReason.UNKNOWN;
369374
}
370375
}
371376

372377
/**
373-
* Categorizes exception messages into low-cardinality groups
378+
* Categorizes exception messages into well-known exception categories
374379
* @param exceptionMessage - The exception message to categorize
375-
* @returns A low-cardinality category string
380+
* @returns A well-known exception category string
376381
*/
377-
private categorizeExceptionMessage(exceptionMessage: string): string {
382+
private categorizeExceptionMessage(exceptionMessage: string): ExceptionType {
378383
const message = exceptionMessage.toLowerCase();
379384

380385
if (message.includes("timeout") || message.includes("timed out")) {
381-
return "timeout_exception";
382-
}
383-
if (message.includes("network") || message.includes("connection")) {
384-
return "network_exception";
386+
return ExceptionType.TIMEOUT_EXCEPTION;
385387
}
386388
if (
387-
message.includes("auth") ||
388-
message.includes("unauthorized") ||
389-
message.includes("forbidden")
389+
message.includes("network") ||
390+
message.includes("connection") ||
391+
message.includes("dns") ||
392+
message.includes("socket")
390393
) {
391-
return "auth_exception";
392-
}
393-
if (message.includes("parsing") || message.includes("parse") || message.includes("invalid")) {
394-
return "parsing_exception";
395-
}
396-
if (message.includes("disk") || message.includes("storage") || message.includes("file")) {
397-
return "storage_exception";
394+
return ExceptionType.NETWORK_EXCEPTION;
398395
}
399-
if (message.includes("memory") || message.includes("out of memory")) {
400-
return "memory_exception";
396+
if (
397+
message.includes("disk") ||
398+
message.includes("storage") ||
399+
message.includes("file") ||
400+
message.includes("persist")
401+
) {
402+
return ExceptionType.STORAGE_EXCEPTION;
401403
}
402404

403-
return "other_exception";
405+
// Default to Client exception for any other cases
406+
return ExceptionType.CLIENT_EXCEPTION;
404407
}
405408

406409
/**
@@ -412,36 +415,36 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
412415
if (statusCode >= 400 && statusCode < 500) {
413416
switch (statusCode) {
414417
case 400:
415-
return "bad_request";
418+
return "Bad request";
416419
case 401:
417-
return "unauthorized";
420+
return "Unauthorized";
418421
case 403:
419-
return "forbidden";
422+
return "Forbidden";
420423
case 404:
421-
return "not_found";
424+
return "Not found";
422425
case 408:
423-
return "request_timeout";
426+
return "Request timeout";
424427
case 413:
425-
return "payload_too_large";
428+
return "Payload too large";
426429
case 429:
427-
return "too_many_requests";
430+
return "Too many requests";
428431
default:
429-
return "client_error_4xx";
432+
return "Client error 4xx";
430433
}
431434
}
432435

433436
if (statusCode >= 500 && statusCode < 600) {
434437
switch (statusCode) {
435438
case 500:
436-
return "internal_server_error";
439+
return "Internal server error";
437440
case 502:
438-
return "bad_gateway";
441+
return "Bad gateway";
439442
case 503:
440-
return "service_unavailable";
443+
return "Service unavailable";
441444
case 504:
442-
return "gateway_timeout";
445+
return "Gateway timeout";
443446
default:
444-
return "server_error_5xx";
447+
return "Server error 5xx";
445448
}
446449
}
447450

@@ -451,13 +454,14 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
451454
* Tracks retried envelopes
452455
* @param envelopes - Number of envelopes retried
453456
* @param retryCode - The retry code indicating the reason for retry
454-
* @param telemetry_type - The type of telemetry being tracked
455457
* @param exceptionMessage - Optional exception message when retryCode is CLIENT_EXCEPTION
458+
* @param exceptionType - Optional explicit exception type override when retryCode is CLIENT_EXCEPTION
456459
*/
457460
public countRetryItems(
458461
envelopes: Envelope[],
459462
retryCode: RetryCode | number,
460463
exceptionMessage?: string,
464+
exceptionType?: ExceptionType,
461465
): void {
462466
const counter: CustomerSDKStats = this.customerSDKStatsCounter;
463467
let telemetry_type: TelemetryType;
@@ -479,7 +483,7 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
479483
}
480484

481485
// Generate a low-cardinality, informative reason description
482-
const reason = this.getRetryReason(retryCode, exceptionMessage);
486+
const reason = this.getRetryReason(retryCode, exceptionMessage, exceptionType);
483487

484488
// Update the count for this reason
485489
const currentCount = reasonMap.get(reason) || 0;
@@ -491,15 +495,24 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
491495
* Generates a low-cardinality, informative description for retry reasons
492496
* @param retryCode - The retry code (enum value or status code number)
493497
* @param exceptionMessage - Optional exception message for CLIENT_EXCEPTION
498+
* @param exceptionType - Optional explicit exception type override for CLIENT_EXCEPTION
494499
* @returns A descriptive reason string with low cardinality
495500
*/
496-
private getRetryReason(retryCode: RetryCode | number, exceptionMessage?: string): string {
501+
private getRetryReason(
502+
retryCode: RetryCode | number,
503+
exceptionMessage?: string,
504+
exceptionType?: ExceptionType,
505+
): string {
497506
if (retryCode === RetryCode.CLIENT_EXCEPTION) {
507+
// If an explicit exception type is provided, use it
508+
if (exceptionType) {
509+
return exceptionType;
510+
}
498511
// For client exceptions, derive a low-cardinality reason from the exception message
499512
if (exceptionMessage) {
500513
return this.categorizeExceptionMessage(exceptionMessage);
501514
}
502-
return "unknown_exception";
515+
return ExceptionType.CLIENT_EXCEPTION;
503516
}
504517

505518
// Handle status code retry codes (numeric values)
@@ -510,12 +523,10 @@ export class CustomerSDKStatsMetrics extends StatsbeatMetrics {
510523
// Handle other enum retry codes
511524
switch (retryCode) {
512525
case RetryCode.CLIENT_TIMEOUT:
513-
return "client_timeout";
514-
case RetryCode.RETRYABLE_STATUS_CODE:
515-
return "retryable_status";
526+
return "Client timeout";
516527
case RetryCode.UNKNOWN:
517528
default:
518-
return "unknown_reason";
529+
return "Unknown reason";
519530
}
520531
}
521532

sdk/monitor/monitor-opentelemetry-exporter/src/export/statsbeat/types.ts

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -187,19 +187,15 @@ export enum TelemetryType {
187187

188188
export enum DropCode {
189189
CLIENT_EXCEPTION = "CLIENT_EXCEPTION",
190-
CLIENT_EXPIRED_DATA = "CLIENT_EXPIRED_DATA",
191190
CLIENT_READONLY = "CLIENT_READONLY",
192-
CLIENT_STALE_DATA = "CLIENT_STALE_DATA",
193191
CLIENT_PERSISTENCE_CAPACITY = "CLIENT_PERSISTENCE_CAPACITY",
194-
NON_RETRYABLE_STATUS_CODE = "NON_RETRYABLE_STATUS_CODE",
195192
CLIENT_STORAGE_DISABLED = "CLIENT_STORAGE_DISABLED",
196193
UNKNOWN = "UNKNOWN",
197194
}
198195

199196
export enum RetryCode {
200197
CLIENT_EXCEPTION = "CLIENT_EXCEPTION",
201198
CLIENT_TIMEOUT = "CLIENT_TIMEOUT",
202-
RETRYABLE_STATUS_CODE = "RETRYABLE_STATUS_CODE",
203199
UNKNOWN = "UNKNOWN",
204200
}
205201

@@ -232,6 +228,26 @@ export enum StatsbeatFeatureType {
232228
INSTRUMENTATION = 1,
233229
}
234230

231+
/**
232+
* Exception types for client exceptions
233+
* @internal
234+
*/
235+
export enum ExceptionType {
236+
CLIENT_EXCEPTION = "Client exception",
237+
NETWORK_EXCEPTION = "Network exception",
238+
STORAGE_EXCEPTION = "Storage exception",
239+
TIMEOUT_EXCEPTION = "Timeout exception",
240+
}
241+
242+
/**
243+
* Reasons for dropping telemetry
244+
*/
245+
export enum DropReason {
246+
CLIENT_READONLY = "Client readonly",
247+
CLIENT_PERSISTENCE_CAPACITY = "Client persistence capacity",
248+
UNKNOWN = "Unknown",
249+
}
250+
235251
/**
236252
* Status codes indicating that we should shutdown statsbeat
237253
* @internal

sdk/monitor/monitor-opentelemetry-exporter/src/platform/nodejs/baseSender.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import { diag } from "@opentelemetry/api";
55
import type { PersistentStorage, SenderResult } from "../../types.js";
6+
import { ExceptionType } from "../../export/statsbeat/types.js";
67
import type { AzureMonitorExporterOptions } from "../../config.js";
78
import { FileSystemPersist } from "./persist/index.js";
89
import type { ExportResult } from "@opentelemetry/core";
@@ -247,6 +248,7 @@ export abstract class BaseSender {
247248
envelopes,
248249
DropCode.CLIENT_EXCEPTION,
249250
redirectError.message,
251+
ExceptionType.CLIENT_EXCEPTION,
250252
);
251253
}
252254
return { code: ExportResultCode.FAILED, error: redirectError };
@@ -283,6 +285,7 @@ export abstract class BaseSender {
283285
envelopes,
284286
RetryCode.CLIENT_TIMEOUT,
285287
"timeout_exception",
288+
ExceptionType.TIMEOUT_EXCEPTION,
286289
);
287290
diag.error("Request timed out. Error message:", restError.message);
288291
} else if (restError.statusCode) {

sdk/monitor/monitor-opentelemetry-exporter/src/platform/nodejs/persist/fileSystemPersist.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ import { confirmDirExists, getShallowDirectorySize } from "./fileSystemHelpers.j
1010
import type { AzureMonitorExporterOptions } from "../../../config.js";
1111
import { readdir, readFile, stat, unlink, writeFile } from "node:fs/promises";
1212
import type { CustomerSDKStatsMetrics } from "../../../export/statsbeat/customerSDKStats.js";
13-
import { DropCode } from "../../../export/statsbeat/types.js";
13+
import { DropCode, ExceptionType } from "../../../export/statsbeat/types.js";
1414
import type { TelemetryItem as Envelope } from "../../../generated/index.js";
1515

1616
/**
@@ -196,6 +196,7 @@ export class FileSystemPersist implements PersistentStorage {
196196
envelopes,
197197
DropCode.CLIENT_EXCEPTION,
198198
writeError?.message,
199+
ExceptionType.STORAGE_EXCEPTION,
199200
);
200201
diag.warn(`Error writing file to persistent file storage`, writeError);
201202
return false;

sdk/monitor/monitor-opentelemetry-exporter/test/internal/baseSender.spec.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -743,6 +743,7 @@ describe("BaseSender", () => {
743743
envelopes,
744744
"CLIENT_EXCEPTION",
745745
"Circular redirect",
746+
"Client exception",
746747
);
747748
});
748749

@@ -774,7 +775,7 @@ describe("BaseSender", () => {
774775
);
775776
});
776777

777-
it("should not capture exception.message for NON_RETRYABLE_STATUS_CODE", async () => {
778+
it("should not capture exception.message for status code errors", async () => {
778779
testSender.sendMock.mockResolvedValue({
779780
statusCode: 400,
780781
result: "Bad Request",
@@ -796,7 +797,7 @@ describe("BaseSender", () => {
796797
expect(result.code).toBe(ExportResultCode.FAILED);
797798
expect(mockCustomerSDKStatsMetrics.countDroppedItems).toHaveBeenCalledWith(envelopes, 400);
798799

799-
// Verify exception.message is not passed for non-client exceptions
800+
// Verify exception.message is not passed for status code errors
800801
const call = mockCustomerSDKStatsMetrics.countDroppedItems.mock.calls[0];
801802
expect(call.length).toBe(2); // envelopes array, drop code (no drop reason)
802803
});

0 commit comments

Comments
 (0)