Skip to content

Conversation

@Logiraptor
Copy link
Contributor

Description

This PR removes the metric otelcol_processor_tail_sampling_sampling_decision_latency.

This metric does not measure the latency of a particular policy. Instead, it measures the latency since policy evaluation began which is mostly not a useful signal.

To make matters worse, profiling shows that recording this metric accounts for >20% of cpu time spent evaluating policies. Since the tailsamplingprocessor is bottlenecked on the single threaded decision loop, this 20% is much better spent on making decisions rather than measuring a misleading metric.

Screenshot 2025-09-09 at 11 15 06 PM

Link to tracking issue

Originally reported in
#38502, which I closed accidentally with a related PR.

This PR removes the metric otelcol_processor_tail_sampling_sampling_decision_latency.

Originally reported in
open-telemetry#38502,
this metric does not measure the latency of a particular policy.
Instead, it measures the latency since policy evaluation began which is
mostly not a useful signal.

To make matters worse, profiling shows that recording this metric
accounts for >20% of cpu time spent evaluating policies. Since the
tailsamplingprocessor is bottlenecked on the single threaded decision
loop, this 20% is much better spent on making decisions rather than
measuring a misleading metric.
@Logiraptor Logiraptor marked this pull request as ready for review September 10, 2025 03:18
@Logiraptor Logiraptor requested a review from a team as a code owner September 10, 2025 03:18
@Logiraptor Logiraptor requested a review from axw September 10, 2025 03:18
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Sep 10, 2025
@github-actions github-actions bot requested a review from portertech September 10, 2025 03:18
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric does not measure the latency of a particular policy. Instead, it measures the latency since policy evaluation began which is mostly not a useful signal.

Agreed, but is that intentional? It seems like a bug that could be fixed. I think it could be useful to know how long each policy's evaluator takes, particularly for more expensive ones like the OTTL evaluator.

Also, #42508 goes in the direction of making evaluators pluggable, so they may be arbitrarily complex.

To make matters worse, profiling shows that recording this metric accounts for >20% of cpu time spent evaluating policies. Since the tailsamplingprocessor is bottlenecked on the single threaded decision loop, this 20% is much better spent on making decisions rather than measuring a misleading metric.

If that's the primary motivation, could you take the single-threadedness into account to reduce the instrumentation overhead? i.e. by accumulating locally and only updating metrics after all policies have been evaluated -- it appears there's something like that already in policyMetrics.addDecision.

Comment on lines -465 to -471
startTime := time.Now()

// Check all policies before making a final decision.
for i, p := range tsp.policies {
decision, err := p.evaluator.Evaluate(ctx, id, trace)
latency := time.Since(startTime)
tsp.telemetry.ProcessorTailSamplingSamplingDecisionLatency.Record(ctx, int64(latency/time.Microsecond), p.attribute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is really that this is cumulative of all preceding policies? In which case the metric, as-is, will really only be meaningful if there's a single policy. That could be fixed by moving the startTime to the top of the loop, if it's important to keep the metric.

@Logiraptor
Copy link
Contributor Author

Logiraptor commented Sep 10, 2025

@axw Thanks for the review!

If that's the primary motivation, could you take the single-threadedness into account to reduce the instrumentation overhead? i.e. by accumulating locally and only updating metrics after all policies have been evaluated -- it appears there's something like that already in policyMetrics.addDecision.

Is that possible with a histogram? I don't see any way to accumulate other than putting all the intermediate latencies in a slice and then calling Record in a loop. But that doesn't make it any faster.

Agreed, but is that intentional? It seems like a bug that could be fixed. I think it could be useful to know how long each policy's evaluator takes, particularly for more expensive ones like the OTTL evaluator.
Also, #42508 goes in the direction of making evaluators pluggable, so they may be arbitrarily complex.

These are good points, but my opinion is that we shouldn't keep code around that reduces performance so much when it's not producing actionable results.

I can think of a couple ways to make it ok performance-wise and keep the ability to locate slow policies:

  1. Make it optional, like otelcol_processor_tail_sampling_count_spans_sampled. In this case I would disable it in my infrastructure for now, because the CPU cost is not worth it.
  2. Record timings for a subset of evaluations, based on some sampling rate
  3. Refactor the code to record total time spent instead of a histogram. In other words, it would be a single counter of total seconds per policy which is easy to accumulate and record after the loop.

My preference would be for (3), but that's still a breaking change for the metric, so I'm not sure it needs to be created in this same PR. Thoughts?

@axw
Copy link
Contributor

axw commented Sep 11, 2025

Is that possible with a histogram? I don't see any way to accumulate other than putting all the intermediate latencies in a slice and then calling Record in a loop. But that doesn't make it any faster.

Ah, I had missed that it was a histogram. I don't think we have an option at the moment then.
Theoretically there could be two options, but the metrics API does not support either of them:

  • Maintain a local histogram and later merge it in
  • Maintain a local histogram and later, for each bucket make a recording with the total & count

Refactor the code to record total time spent instead of a histogram. In other words, it would be a single counter of total seconds per policy which is easy to accumulate and record after the loop.

This sounds OK to me. @portertech thoughts?

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 25, 2025
atoulme pushed a commit that referenced this pull request Oct 3, 2025
Hoping to help spread the load of code ownership, and I'm spending lots
of time working on the TSP anyway.

List of contributions:

*
#41888
*
#41617
*
#41546
*
#39761
*
#37722
*
#37035
*
#41656
*
#38502
*
#42620

---------

Co-authored-by: Christos Markou <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 9, 2025
songy23 pushed a commit that referenced this pull request Nov 17, 2025
)

#### Description

This PR removes the metric
otelcol_processor_tail_sampling_sampling_decision_latency. It adds a
pair of metrics as replacement called
`processor_tail_sampling_sampling_policy_cpu_time` and
`processor_tail_sampling_sampling_policy_executions`. It is an
implementation of the feedback received in
#42620

Originally reported in
#38502,
this metric does not measure the latency of a particular policy.
Instead, it measures the latency since policy evaluation began which is
mostly not a useful signal.

To make matters worse, profiling shows that recording this metric
accounts for >20% of cpu time spent evaluating policies. Since the
tailsamplingprocessor is bottlenecked on the single threaded decision
loop, this 20% is much better spent on making decisions rather than
measuring a misleading metric.

As a replacement, I've added a metric to track total time spent on each
policy as well as count total executions. This allows slow policies to
still be identified by checking their total or average execution time
without the heavy CPU / gc pressure / synchronization cost of recording
a histogram in the inner loop.

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->


<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes #38502 - closed on accident, and I am not otel enough to reopen it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants