[chore] Tail-sampling drop policy documentation (open-telemetry#39801)

portertech · dragonlord93 · commit d8388cc54c8d · 2025-05-23T23:13:19.000+05:30
Documentation for the recently added [drop policy type](open-telemetry#39668). The examples replace the use of invert_match. I attempted to make as few changes as necessary at this time. I intended to add further documentation while deprecating the top-level invert_match logic/decisions in a separate pull-request. Signed-off-by: Sean Porter <portertech@gmail.com>
diff --git a/processor/tailsamplingprocessor/README.md b/processor/tailsamplingprocessor/README.md
@@ -36,8 +36,9 @@ Multiple policies exist today and it is straight forward to add more. These incl
 - `span_count`: Sample based on the minimum and/or maximum number of spans, inclusive. If the sum of all spans in the trace is outside the range threshold, the trace will not be sampled.
 - `boolean_attribute`: Sample based on boolean attribute (resource and record).
 - `ottl_condition`: Sample based on given boolean OTTL condition (span and span event).
-- `and`: Sample based on multiple policies, creates an AND policy 
-- `composite`: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order. 
+- `and`: Sample based on multiple policies, creates an AND policy
+- `drop`: Drop (not sample) based on multiple policies, creates a DROP policy
+- `composite`: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order.
   For example if we have set max_total_spans_per_second as 100 then we can set rate_allocation as follows
   1. test-composite-policy-1 = 50 % of max_total_spans_per_second = 50 spans_per_second
   2. test-composite-policy-2 = 25 % of max_total_spans_per_second = 25 spans_per_second
@@ -53,7 +54,7 @@ The following configuration options can also be modified:
   Additionally, if using, configure this as much greater than `num_traces` so decisions for trace IDs are kept
   longer than the span data for the trace.
   - `sampled_cache_size` (default = 0): Configures amount of trace IDs to be kept in an LRU cache,
-  persisting the "keep" decisions for traces that may have already been released from memory. 
+  persisting the "keep" decisions for traces that may have already been released from memory.
   By default, the size is 0 and the cache is inactive.
   - `non_sampled_cache_size` (default = 0) Configures amount of trace IDs to be kept in an LRU cache,
     persisting the "drop" decisions for traces that may have already been released from memory.
@@ -62,6 +63,7 @@ The following configuration options can also be modified:
 
 Each policy will result in a decision, and the processor will evaluate them to make a final decision:
 
+- When there's a "drop" decision, the trace is not sampled;
 - When there's an "inverted not sample" decision, the trace is not sampled;
 - When there's a "sample" decision, the trace is sampled;
 - When there's a "inverted sample" decision and no "not sample" decisions, the trace is sampled;
@@ -123,26 +125,21 @@ processors:
          },
          {
             name: test-policy-9,
-            type: string_attribute,
-            string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true, invert_match: true}
-         },
-         {
-            name: test-policy-10,
             type: span_count,
             span_count: {min_spans: 2, max_spans: 20}
          },
          {
-             name: test-policy-11,
+             name: test-policy-10,
              type: trace_state,
              trace_state: { key: key3, values: [value1, value2] }
          },
          {
-              name: test-policy-12,
+              name: test-policy-11,
               type: boolean_attribute,
               boolean_attribute: {key: key4, value: true}
          },
          {
-              name: test-policy-13,
+              name: test-policy-12,
               type: ottl_condition,
               ottl_condition: {
                    error_mode: ignore,
@@ -160,7 +157,7 @@ processors:
             name: and-policy-1,
             type: and,
             and: {
-              and_sub_policy: 
+              and_sub_policy:
               [
                 {
                   name: test-and-policy-1,
@@ -175,6 +172,20 @@ processors:
               ]
             }
          },
+         {
+            name: drop-policy-1,
+            type: drop,
+            drop: {
+              drop_sub_policy:
+              [
+                {
+                    name: test-drop-policy-1,
+                    type: string_attribute,
+                    string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true}
+                }
+              ]
+            }
+         },
          {
             name: composite-policy-1,
             type: composite,
@@ -419,13 +430,22 @@ tail_sampling:
         type: boolean_attribute,
         boolean_attribute: { key: app.force_sample, value: true },
       },
-    {
-      # Rule 7:
-      # never sample if the do_not_sample attribute is set to true
-      name: team_a-do-not-sample,
-      type: boolean_attribute,
-      boolean_attribute: { key: app.do_not_sample, value: true, invert_match: true },
-    },
+      {
+        # Rule 7:
+        # never sample if the do_not_sample attribute is set to true
+        name: do-not-sample,
+        type: drop,
+        drop: {
+          drop_sub_policy:
+            [
+              {
+                name: team_a-do-not-sample,
+                type: boolean_attribute,
+                string_attribute: { key: app.do_not_sample, value: true }
+              }
+            ]
+        }
+      },
       # END: policies for team_a
     ]
 ```
@@ -442,7 +462,7 @@ The [probabilistic sampling processor][probabilistic_sampling_processor] and the
 
 As a rule of thumb, if you want to add probabilistic sampling and...
 
-...you are not using the tail sampling processor already: use the [probabilistic sampling processor][probabilistic_sampling_processor]. Running the probabilistic sampling processor is more efficient than the tail sampling processor. The probabilistic sampling policy makes decision based upon the trace ID, so waiting until more spans have arrived will not influence its decision. 
+...you are not using the tail sampling processor already: use the [probabilistic sampling processor][probabilistic_sampling_processor]. Running the probabilistic sampling processor is more efficient than the tail sampling processor. The probabilistic sampling policy makes decision based upon the trace ID, so waiting until more spans have arrived will not influence its decision.
 
 ...you are already using the tail sampling processor: add the probabilistic sampling policy. You are already incurring the cost of running the tail sampling processor, adding the probabilistic policy will be negligible. Additionally, using the policy within the tail sampling processor will ensure traces that are sampled by other policies will not be dropped.
 
@@ -457,7 +477,7 @@ As a rule of thumb, if you want to add probabilistic sampling and...
 option allows, some will have to be dropped before they can be sampled. Increasing the value of `num_traces` can
 help resolve this error, at the expense of increased memory usage.
 
-## Monitoring and Tuning 
+## Monitoring and Tuning
 
 See [documentation.md][documentation_md] for the full list metrics available for this component and their descriptions.
 
@@ -494,7 +514,7 @@ It's therefore recommended to consume this component's output with components th
 A span's arrival is considered "late" if it arrives after its trace's sampling decision is made. Late spans can cause different sampling decisions for different parts of the trace.
 
 There are two scenarios for late arriving spans:
-- Scenario 1: While the sampling decision of the trace remains in the circular buffer of `num_traces` length, the late spans inherit that decision. That means late spans do not influence the trace's sampling decision. 
+- Scenario 1: While the sampling decision of the trace remains in the circular buffer of `num_traces` length, the late spans inherit that decision. That means late spans do not influence the trace's sampling decision.
 - Scenario 2: (Default, no decision cache configured) After the sampling decision is removed from the buffer, it's as if this component has never seen the trace before: The late spans are buffered for `decision_wait` seconds and then a new sampling decision is made.
 - Scenario 3: (Decision cache is configured) When a "keep" decision is made on a trace, the trace ID is cached. The component will remember which trace IDs it sampled even after it releases the span data from memory. Unless it has been evicted from the cache after some time, it will remember the same "keep trace" decision.
 
@@ -511,10 +531,10 @@ It may also be useful to:
 
 **Sampled Frequency**
 
-To track the percentage of traces that were actually sampled, use: 
+To track the percentage of traces that were actually sampled, use:
 
 ```
-otelcol_processor_tail_sampling_global_count_traces_sampled{sampled="true"} / 
+otelcol_processor_tail_sampling_global_count_traces_sampled{sampled="true"} /
 otelcol_processor_tail_sampling_global_count_traces_sampled
 ```
 
@@ -523,8 +543,8 @@ otelcol_processor_tail_sampling_global_count_traces_sampled
 To see how often each policy votes to sample a trace, use:
 
 ```
-sum (otelcol_processor_tail_sampling_count_traces_sampled{sampled="true"}) by (policy) / 
-sum (otelcol_processor_tail_sampling_count_traces_sampled) by (policy) 
+sum (otelcol_processor_tail_sampling_count_traces_sampled{sampled="true"}) by (policy) /
+sum (otelcol_processor_tail_sampling_count_traces_sampled) by (policy)
 ```
 
 As a reminder, a policy voting to sample the trace does not guarantee sampling; an "inverted not" decision from another policy would still discard the trace.