Skip to content

[processor/tailsampling] Record which sampling policy was responsible for the decision #37797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 14, 2025

Conversation

djluck
Copy link
Contributor

@djluck djluck commented Feb 9, 2025

Re-opening a stale PR: #36312
Resolves #35180.

We we're close to it being merged. @jpkrohling do you mind finalizing the review when you get a chance?

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, changing the existing benchmark to record the policy shows that the costs are OK:

This change (plus setting tsp.recordPolicy = true in BenchmarkSampling):

Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^BenchmarkSampling$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor

goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
BenchmarkSampling-16    	   40365	     28595 ns/op	    6282 B/op	     258 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor	1.478s

Baseline:

Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^BenchmarkSampling$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor

goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
BenchmarkSampling-16    	   47973	     24604 ns/op	    6260 B/op	     257 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor	1.458s

@jpkrohling jpkrohling changed the title Tailsampling record policy [processor/tailsampling] Record which sampling policy was responsible for the decision Feb 10, 2025
@djluck
Copy link
Contributor Author

djluck commented Feb 10, 2025

@jpkrohling thanks for the review 🙇

@jpkrohling
Copy link
Member

Once the CI is green, I think this is ready to be merged.

@djluck
Copy link
Contributor Author

djluck commented Feb 12, 2025

Apologies, I missed that check- just pushed a change after running make generate

…olicy) associated with an inclusive tail processor sampling decision.

Resolves !35180.

- This functionality lives behind a feature flag that is disabled by default
- The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set

- Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic
- Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations)

Does this require a CHANGELOG entry?
- Added README entry for the feature flag
- Added missing mutex lock around reading trace data in `SetAttrOnScopeSpans`
- Added tests + benchmarks for `SetAttrOnScopeSpans`
…olicy) associated with an inclusive tail processor sampling decision.

Resolves !35180.

- This functionality lives behind a feature flag that is disabled by default
- The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set

- Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic
- Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations)

Does this require a CHANGELOG entry?
@djluck
Copy link
Contributor Author

djluck commented Feb 12, 2025

@jpkrohling I re-ran make gci after make generate and it seems they are fighting each other 😢 I didn't commit the changes make gci wanted to make but if the pipeline fails, could you provide guidance on the command I should be running?

@djluck
Copy link
Contributor Author

djluck commented Feb 14, 2025

@jpkrohling looks like we're green 🥳 Do you mind merging when you get a moment?

@jpkrohling jpkrohling merged commit 843499f into open-telemetry:main Feb 14, 2025
162 checks passed
@jpkrohling
Copy link
Member

Merged, thanks!

@github-actions github-actions bot added this to the next release milestone Feb 14, 2025
@djluck
Copy link
Contributor Author

djluck commented Feb 14, 2025

Wonderful, thanks again for all your help @jpkrohling

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 14, 2025

It looks like this PR conflicts with #37035 merged two days ago, which removed the options variadic argument from newTracesProcessor (see current signature). The code in this PR still uses it, so the tailsampling processor now fails to build.

Update: I opened #37931 to fix this.

jade-guiton-dd added a commit to jade-guiton-dd/opentelemetry-collector-contrib that referenced this pull request Feb 14, 2025
songy23 pushed a commit that referenced this pull request Feb 14, 2025
#### Description

Two PRs were merged recently on the tailsamplingprocessor, #37797 and
#37035. #37035 changed the signature of an internal function in a way
that broke #37797. The result is that the component [fails to
build](https://github.com/open-telemetry/opentelemetry-collector/actions/runs/13329091378/job/37228871811?pr=12384).
This PR fixes that.

This wasn't noticed before merging because 1. there were no merge
conflicts, 2. the latest rebase of #37797 was before #37035 was merged,
and 3. there is no merge queue to perform final checks.

---------

Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add policy to spans in sampled traces
4 participants