-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[tailsamplingprocessor] Decision Timer Latency Metric is misleading #38502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
bug
Something isn't working
needs triage
New item requiring triage
processor/tailsampling
Tail sampling processor
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This was referenced Mar 23, 2025
This was referenced Apr 1, 2025
This was referenced Apr 15, 2025
Logiraptor
added a commit
to Logiraptor/opentelemetry-collector-contrib
that referenced
this issue
Apr 29, 2025
It will now measure the total time spent on one tick of the decision timer. This is more useful, since the decision tick loop needs to run at least once per second for the tsp to keep up with incoming traces. Fixes open-telemetry#38502
Logiraptor
added a commit
to Logiraptor/opentelemetry-collector-contrib
that referenced
this issue
Apr 29, 2025
It will now measure the total time spent on one tick of the decision timer. This is more useful, since the decision tick loop needs to run at least once per second for the tsp to keep up with incoming traces. Fixes open-telemetry#38502
vincentfree
pushed a commit
to ing-bank/opentelemetry-collector-contrib
that referenced
this issue
May 6, 2025
…ric (open-telemetry#39761) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description It will now measure the total time spent on one tick of the decision timer. This is more useful, since the decision tick loop needs to run at least once per second for the tsp to keep up with incoming traces. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#38502
vincentfree
pushed a commit
to ing-bank/opentelemetry-collector-contrib
that referenced
this issue
May 20, 2025
…ric (open-telemetry#39761) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description It will now measure the total time spent on one tick of the decision timer. This is more useful, since the decision tick loop needs to run at least once per second for the tsp to keep up with incoming traces. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#38502
dragonlord93
pushed a commit
to dragonlord93/opentelemetry-collector-contrib
that referenced
this issue
May 23, 2025
…ric (open-telemetry#39761) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description It will now measure the total time spent on one tick of the decision timer. This is more useful, since the decision tick loop needs to run at least once per second for the tsp to keep up with incoming traces. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#38502
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
needs triage
New item requiring triage
processor/tailsampling
Tail sampling processor
Component(s)
processor/tailsampling
What happened?
Description
After #37722, I realized there is another issue with the decision timer latency metric.
Basically, it currently measures a latency from starting policy evaluation until just after each trace is evaluated. This isn't super useful, consider the following scenario:
Steps to Reproduce
Run the tsp, observe the
otelcol_processor_tail_sampling_sampling_decision_timer_latency
metric.Expected Result
I would expect this metric to report one of two things, either:
IMO (1) is more useful, since it's a direct indication of whether the tsp is at risk of falling behind on processing traces.
Actual Result
I considered opening a PR to change the implementation to (1) above, but figured I would open this issue first to make sure I'm not missing an important use-case for (2). Happy to submit the PR though if not!
Collector version
N/A
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: