-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[processor/tailsampling] Record which sampling policy was responsible for the decision #37797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsampling] Record which sampling policy was responsible for the decision #37797
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, changing the existing benchmark to record the policy shows that the costs are OK:
This change (plus setting tsp.recordPolicy = true
in BenchmarkSampling
):
Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^BenchmarkSampling$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
BenchmarkSampling-16 40365 28595 ns/op 6282 B/op 258 allocs/op
PASS
ok github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor 1.478s
Baseline:
Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^BenchmarkSampling$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
BenchmarkSampling-16 47973 24604 ns/op 6260 B/op 257 allocs/op
PASS
ok github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor 1.458s
@jpkrohling thanks for the review 🙇 |
Once the CI is green, I think this is ready to be merged. |
Apologies, I missed that check- just pushed a change after running |
…olicy) associated with an inclusive tail processor sampling decision. Resolves !35180. - This functionality lives behind a feature flag that is disabled by default - The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set - Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic - Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations) Does this require a CHANGELOG entry?
- Added README entry for the feature flag - Added missing mutex lock around reading trace data in `SetAttrOnScopeSpans` - Added tests + benchmarks for `SetAttrOnScopeSpans`
…olicy) associated with an inclusive tail processor sampling decision. Resolves !35180. - This functionality lives behind a feature flag that is disabled by default - The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set - Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic - Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations) Does this require a CHANGELOG entry?
913ab9c
to
6a3b105
Compare
@jpkrohling I re-ran |
@jpkrohling looks like we're green 🥳 Do you mind merging when you get a moment? |
Merged, thanks! |
Wonderful, thanks again for all your help @jpkrohling |
It looks like this PR conflicts with #37035 merged two days ago, which removed the Update: I opened #37931 to fix this. |
#### Description Two PRs were merged recently on the tailsamplingprocessor, #37797 and #37035. #37035 changed the signature of an internal function in a way that broke #37797. The result is that the component [fails to build](https://github.com/open-telemetry/opentelemetry-collector/actions/runs/13329091378/job/37228871811?pr=12384). This PR fixes that. This wasn't noticed before merging because 1. there were no merge conflicts, 2. the latest rebase of #37797 was before #37035 was merged, and 3. there is no merge queue to perform final checks. --------- Signed-off-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Re-opening a stale PR: #36312
Resolves #35180.
We we're close to it being merged. @jpkrohling do you mind finalizing the review when you get a chance?