Skip to content

Commit d8388cc

Browse files
portertechdragonlord93
authored andcommitted
[chore] Tail-sampling drop policy documentation (open-telemetry#39801)
Documentation for the recently added [drop policy type](open-telemetry#39668). The examples replace the use of invert_match. I attempted to make as few changes as necessary at this time. I intended to add further documentation while deprecating the top-level invert_match logic/decisions in a separate pull-request. Signed-off-by: Sean Porter <[email protected]>
1 parent d5d6c65 commit d8388cc

File tree

1 file changed

+46
-26
lines changed

1 file changed

+46
-26
lines changed

processor/tailsamplingprocessor/README.md

Lines changed: 46 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@ Multiple policies exist today and it is straight forward to add more. These incl
3636
- `span_count`: Sample based on the minimum and/or maximum number of spans, inclusive. If the sum of all spans in the trace is outside the range threshold, the trace will not be sampled.
3737
- `boolean_attribute`: Sample based on boolean attribute (resource and record).
3838
- `ottl_condition`: Sample based on given boolean OTTL condition (span and span event).
39-
- `and`: Sample based on multiple policies, creates an AND policy
40-
- `composite`: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order.
39+
- `and`: Sample based on multiple policies, creates an AND policy
40+
- `drop`: Drop (not sample) based on multiple policies, creates a DROP policy
41+
- `composite`: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order.
4142
For example if we have set max_total_spans_per_second as 100 then we can set rate_allocation as follows
4243
1. test-composite-policy-1 = 50 % of max_total_spans_per_second = 50 spans_per_second
4344
2. test-composite-policy-2 = 25 % of max_total_spans_per_second = 25 spans_per_second
@@ -53,7 +54,7 @@ The following configuration options can also be modified:
5354
Additionally, if using, configure this as much greater than `num_traces` so decisions for trace IDs are kept
5455
longer than the span data for the trace.
5556
- `sampled_cache_size` (default = 0): Configures amount of trace IDs to be kept in an LRU cache,
56-
persisting the "keep" decisions for traces that may have already been released from memory.
57+
persisting the "keep" decisions for traces that may have already been released from memory.
5758
By default, the size is 0 and the cache is inactive.
5859
- `non_sampled_cache_size` (default = 0) Configures amount of trace IDs to be kept in an LRU cache,
5960
persisting the "drop" decisions for traces that may have already been released from memory.
@@ -62,6 +63,7 @@ The following configuration options can also be modified:
6263

6364
Each policy will result in a decision, and the processor will evaluate them to make a final decision:
6465

66+
- When there's a "drop" decision, the trace is not sampled;
6567
- When there's an "inverted not sample" decision, the trace is not sampled;
6668
- When there's a "sample" decision, the trace is sampled;
6769
- When there's a "inverted sample" decision and no "not sample" decisions, the trace is sampled;
@@ -123,26 +125,21 @@ processors:
123125
},
124126
{
125127
name: test-policy-9,
126-
type: string_attribute,
127-
string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true, invert_match: true}
128-
},
129-
{
130-
name: test-policy-10,
131128
type: span_count,
132129
span_count: {min_spans: 2, max_spans: 20}
133130
},
134131
{
135-
name: test-policy-11,
132+
name: test-policy-10,
136133
type: trace_state,
137134
trace_state: { key: key3, values: [value1, value2] }
138135
},
139136
{
140-
name: test-policy-12,
137+
name: test-policy-11,
141138
type: boolean_attribute,
142139
boolean_attribute: {key: key4, value: true}
143140
},
144141
{
145-
name: test-policy-13,
142+
name: test-policy-12,
146143
type: ottl_condition,
147144
ottl_condition: {
148145
error_mode: ignore,
@@ -160,7 +157,7 @@ processors:
160157
name: and-policy-1,
161158
type: and,
162159
and: {
163-
and_sub_policy:
160+
and_sub_policy:
164161
[
165162
{
166163
name: test-and-policy-1,
@@ -175,6 +172,20 @@ processors:
175172
]
176173
}
177174
},
175+
{
176+
name: drop-policy-1,
177+
type: drop,
178+
drop: {
179+
drop_sub_policy:
180+
[
181+
{
182+
name: test-drop-policy-1,
183+
type: string_attribute,
184+
string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true}
185+
}
186+
]
187+
}
188+
},
178189
{
179190
name: composite-policy-1,
180191
type: composite,
@@ -419,13 +430,22 @@ tail_sampling:
419430
type: boolean_attribute,
420431
boolean_attribute: { key: app.force_sample, value: true },
421432
},
422-
{
423-
# Rule 7:
424-
# never sample if the do_not_sample attribute is set to true
425-
name: team_a-do-not-sample,
426-
type: boolean_attribute,
427-
boolean_attribute: { key: app.do_not_sample, value: true, invert_match: true },
428-
},
433+
{
434+
# Rule 7:
435+
# never sample if the do_not_sample attribute is set to true
436+
name: do-not-sample,
437+
type: drop,
438+
drop: {
439+
drop_sub_policy:
440+
[
441+
{
442+
name: team_a-do-not-sample,
443+
type: boolean_attribute,
444+
string_attribute: { key: app.do_not_sample, value: true }
445+
}
446+
]
447+
}
448+
},
429449
# END: policies for team_a
430450
]
431451
```
@@ -442,7 +462,7 @@ The [probabilistic sampling processor][probabilistic_sampling_processor] and the
442462

443463
As a rule of thumb, if you want to add probabilistic sampling and...
444464

445-
...you are not using the tail sampling processor already: use the [probabilistic sampling processor][probabilistic_sampling_processor]. Running the probabilistic sampling processor is more efficient than the tail sampling processor. The probabilistic sampling policy makes decision based upon the trace ID, so waiting until more spans have arrived will not influence its decision.
465+
...you are not using the tail sampling processor already: use the [probabilistic sampling processor][probabilistic_sampling_processor]. Running the probabilistic sampling processor is more efficient than the tail sampling processor. The probabilistic sampling policy makes decision based upon the trace ID, so waiting until more spans have arrived will not influence its decision.
446466

447467
...you are already using the tail sampling processor: add the probabilistic sampling policy. You are already incurring the cost of running the tail sampling processor, adding the probabilistic policy will be negligible. Additionally, using the policy within the tail sampling processor will ensure traces that are sampled by other policies will not be dropped.
448468

@@ -457,7 +477,7 @@ As a rule of thumb, if you want to add probabilistic sampling and...
457477
option allows, some will have to be dropped before they can be sampled. Increasing the value of `num_traces` can
458478
help resolve this error, at the expense of increased memory usage.
459479

460-
## Monitoring and Tuning
480+
## Monitoring and Tuning
461481

462482
See [documentation.md][documentation_md] for the full list metrics available for this component and their descriptions.
463483

@@ -494,7 +514,7 @@ It's therefore recommended to consume this component's output with components th
494514
A span's arrival is considered "late" if it arrives after its trace's sampling decision is made. Late spans can cause different sampling decisions for different parts of the trace.
495515

496516
There are two scenarios for late arriving spans:
497-
- Scenario 1: While the sampling decision of the trace remains in the circular buffer of `num_traces` length, the late spans inherit that decision. That means late spans do not influence the trace's sampling decision.
517+
- Scenario 1: While the sampling decision of the trace remains in the circular buffer of `num_traces` length, the late spans inherit that decision. That means late spans do not influence the trace's sampling decision.
498518
- Scenario 2: (Default, no decision cache configured) After the sampling decision is removed from the buffer, it's as if this component has never seen the trace before: The late spans are buffered for `decision_wait` seconds and then a new sampling decision is made.
499519
- Scenario 3: (Decision cache is configured) When a "keep" decision is made on a trace, the trace ID is cached. The component will remember which trace IDs it sampled even after it releases the span data from memory. Unless it has been evicted from the cache after some time, it will remember the same "keep trace" decision.
500520

@@ -511,10 +531,10 @@ It may also be useful to:
511531

512532
**Sampled Frequency**
513533

514-
To track the percentage of traces that were actually sampled, use:
534+
To track the percentage of traces that were actually sampled, use:
515535

516536
```
517-
otelcol_processor_tail_sampling_global_count_traces_sampled{sampled="true"} /
537+
otelcol_processor_tail_sampling_global_count_traces_sampled{sampled="true"} /
518538
otelcol_processor_tail_sampling_global_count_traces_sampled
519539
```
520540

@@ -523,8 +543,8 @@ otelcol_processor_tail_sampling_global_count_traces_sampled
523543
To see how often each policy votes to sample a trace, use:
524544

525545
```
526-
sum (otelcol_processor_tail_sampling_count_traces_sampled{sampled="true"}) by (policy) /
527-
sum (otelcol_processor_tail_sampling_count_traces_sampled) by (policy)
546+
sum (otelcol_processor_tail_sampling_count_traces_sampled{sampled="true"}) by (policy) /
547+
sum (otelcol_processor_tail_sampling_count_traces_sampled) by (policy)
528548
```
529549

530550
As a reminder, a policy voting to sample the trace does not guarantee sampling; an "inverted not" decision from another policy would still discard the trace.

0 commit comments

Comments
 (0)