Skip to content

Add policy to spans in sampled traces #35180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kawaiwanyelp opened this issue Sep 13, 2024 · 8 comments · Fixed by #37797
Closed

Add policy to spans in sampled traces #35180

kawaiwanyelp opened this issue Sep 13, 2024 · 8 comments · Fixed by #37797
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor Stale

Comments

@kawaiwanyelp
Copy link

kawaiwanyelp commented Sep 13, 2024

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

There is no currently no way to tell by looking at a sampled trace which policy sampled it. If we could, we'd be able to analyze traces by policy, direct traces to different pipelines by policy, and better understand the sampling behavior of this processor.

Describe the solution you'd like

I propose that we add the name of the first top-level policy that returns a positive sampling decision to each span in a trace as an span context attribute. For example, given the following policies:

policies:
  tail_sampling:
    policies:
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 10
      - name: http-error-policy
        type: and
        and:
          and_sub_policy:
            - name: error-code-policy
              type: ottl_condition
              ottl_condition: 
                span:
                  - 'IsMatch(attributes["http.response.status_code"], "^[45][0-9][0-9]$")'
            - name: probabilistic-policy
              type: probabilistic
              probabilistic:
                sampling_percentage: 100

And given a trace which includes a span with a 404 status code. Say the probabilistic-policy returned a NotSampled decision, but the http-error-policy returned a Sampled decision. A span in this trace would look like:

{
    "name": "client_request",
    "kind": "SpanKind.CLIENT",
    "attributes": {
        "http.response.status_code": 404,
        "sampling.policy": "http-error-policy"  // new attribute added
    },
    ...
}

Describe alternatives you've considered

  • Branch a pipeline into multiple using the forward connector, set up tail-sampling on each branch with different policies, then use the transform processor to add an attribute for the branch a trace was sampled in: Not only does this use more resources as multiple processors have to be set up, it can also result in duplicate samples, and there isn't an easy way to dedup spans/traces in the collector.
  • Add the policy to just the root span: This would mean less work for the processor to do, but wouldn't be great if the root span was somehow dropped or lost
  • Add all matching policies as a list value: List values are harder to aggregate over than simple string values. List values can also get pretty long for complex policy setups.

Additional context

I'd be willing to create a PR for this if maintainers agree that this is a good idea.

@kawaiwanyelp kawaiwanyelp added enhancement New feature or request needs triage New item requiring triage labels Sep 13, 2024
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Sep 13, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@kawaiwanyelp kawaiwanyelp changed the title Add policy to sampled spans Add policy to spans in sampled traces Sep 13, 2024
@jpkrohling
Copy link
Member

I'm not keen on adding another attribute to all spans passing through the tail-sampling processor but I'd be willing to look at PR for this if this is placed behind a feature gate.

@jpkrohling jpkrohling removed the needs triage New item requiring triage label Sep 18, 2024
@andy-paine-numan
Copy link

andy-paine-numan commented Sep 24, 2024

This would be really useful for debugging new sampling configs in particular.

Add the policy to just the root span: This would mean less work for the processor to do, but wouldn't be great if the root span was somehow dropped or lost

I don't think the risk of losing the root span is very high here since we are using the tailsamplingprocessor, we already know that we've ingested the root span into our collector pipeline for the policies to run. Given this would mean only adding a single resource attribute/attribute rather than adding attributes to every span, this feels like a good solution to me.

@djluck
Copy link
Contributor

djluck commented Nov 6, 2024

Just to add that we have a use case where such an attribute could be very helpful: understanding latency distributions. We combine the probabilistic policy with the latency policy and when attempting to analyze operation latency for our services, it's hard to get an accurate picture as the latency policy (as by design) biases towards slow traces.

If the policy was recorded as an attribute, we'd be able to calculate latency over only the traces captured by the probabilistic policy.

@djluck
Copy link
Contributor

djluck commented Nov 12, 2024

@jpkrohling I've made an attempt at resolving this issue in this PR, I'd be very grateful if you could review this at some point.

@mrsimo
Copy link

mrsimo commented Nov 19, 2024

We'd absolutely love this feature, even better if we can enable it conditionally. Every time I need to troubleshoot our tail sampling configuration I'd kill for it.

@mugli
Copy link

mugli commented Dec 12, 2024

This would make our lives much easier. I had a very similar request, created an issue but closed that one after finding this.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 11, 2025
jpkrohling pushed a commit that referenced this issue Feb 14, 2025
… for the decision (#37797)

Re-opening a stale PR:
#36312
Resolves
#35180.

We we're close to it being merged. @jpkrohling do you mind finalizing
the review when you get a chance?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor Stale
Projects
None yet
6 participants