Skip to content

Fix data race in ESv1 WriteSpan by using distinct inputs #7196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sonalgaud12
Copy link

@sonalgaud12 sonalgaud12 commented Jun 2, 2025

Which problem is this PR solving?

Description of the changes

  • Sent different spans to both functions to stop data race

How was this change tested?

image

Checklist

@sonalgaud12 sonalgaud12 requested a review from a team as a code owner June 2, 2025 13:56
@sonalgaud12 sonalgaud12 requested a review from albertteoh June 2, 2025 13:56
@sonalgaud12
Copy link
Author

@yurishkuro I would like your review, Also please let me know if i need to write any test

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions is the same span being modified? Storage should not be in the business of modifying input data, it has internal db model it can use to modify.

@sonalgaud12 sonalgaud12 force-pushed the fix/esv1-data-race branch from c084740 to d6e1546 Compare June 2, 2025 15:47
@sonalgaud12 sonalgaud12 requested a review from yurishkuro June 2, 2025 15:48
@yurishkuro
Copy link
Member

Under what conditions is the same span being modified? Storage should not be in the business of modifying input data, it has internal db model it can use to modify.

@sonalgaud12
Copy link
Author

To fix this, now I create a deep copy before applying any changes, so the input span remains untouched..

@yurishkuro
Copy link
Member

You are not answering my question. The fix should be not mutating the data, not making a copy of the data and then mutating it. Copying is expensive at scale.

@sonalgaud12
Copy link
Author

sonalgaud12 commented Jun 2, 2025

I realize that the real issue is mutating the input span, which ideally should be treated as immutable by the storage layer. Allow me some time, and I will fix it.

@sonalgaud12
Copy link
Author

@yurishkuro Can you give me a little direction on whether I should apply transform without mutation?

@yurishkuro
Copy link
Member

@sonalgaud12 I don't think we should be fixing the problem without understanding the root cause. First, let's find the code in storage that mutates input data. Then decide if that mutation is actually necessary or if the same capability can be achieved without it.

@sonalgaud12 sonalgaud12 force-pushed the fix/esv1-data-race branch 2 times, most recently from ffb212b to ff9e69b Compare June 4, 2025 15:47
@Manik2708
Copy link
Contributor

@sonalgaud12 Please see the comment #7168 (comment). Most probably it will fix the issue.

sonalgaud12 and others added 6 commits June 5, 2025 20:47
Signed-off-by: Sonal Gaud <[email protected]>
Signed-off-by: sonalgaud12 <[email protected]>
Signed-off-by: sonalgaud12 <[email protected]>
Signed-off-by: sonalgaud12 <[email protected]>
Signed-off-by: sonalgaud12 <[email protected]>
@sonalgaud12 sonalgaud12 force-pushed the fix/esv1-data-race branch from d7ff8a6 to b46a236 Compare June 5, 2025 15:17
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the symptom, not the root cause. The storage MUST NOT mutate input data. In the context of OTEL collector framework, it's perfectly valid to configure multiple exporters for the same data, and if one of the exporters mutates this data it would still cause race conditions.

@Manik2708
Copy link
Contributor

This fixes the symptom, not the root cause. The storage MUST NOT mutate input data. In the context of OTEL collector framework, it's perfectly valid to configure multiple exporters for the same data, and if one of the exporters mutates this data it would still cause race conditions.

But it isn't the model or otel span, it's the db span. Shouldn't that work?

@yurishkuro
Copy link
Member

@Manik2708 ah, good point

@yurishkuro yurishkuro added the changelog:test Change that's adding missing tests or correcting existing tests label Jun 5, 2025
@yurishkuro yurishkuro changed the title Fix data race in ESv1 WriteSpan Fix data race in ESv1 WriteSpan by using distinct inputs Jun 5, 2025
@yurishkuro yurishkuro enabled auto-merge June 5, 2025 19:20
@Manik2708
Copy link
Contributor

Manik2708 commented Jun 5, 2025

Also another point, it is not related to V1, but to the base factory. I think we should now work on moving the shared code from v1 to v2 so that in future no v1 dependent code is added in v2. And it could easily be removed in future.

Copy link

codecov bot commented Jun 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.17%. Comparing base (bafa507) to head (b46a236).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7196   +/-   ##
=======================================
  Coverage   96.17%   96.17%           
=======================================
  Files         366      366           
  Lines       21947    21947           
=======================================
  Hits        21107    21107           
  Misses        627      627           
  Partials      213      213           
Flag Coverage Δ
badger_v1 9.85% <ø> (ø)
badger_v2 1.90% <ø> (ø)
cassandra-4.x-v1-manual 14.82% <ø> (ø)
cassandra-4.x-v2-auto 1.89% <ø> (ø)
cassandra-4.x-v2-manual 1.89% <ø> (ø)
cassandra-5.x-v1-manual 14.82% <ø> (ø)
cassandra-5.x-v2-auto 1.89% <ø> (ø)
cassandra-5.x-v2-manual 1.89% <ø> (ø)
elasticsearch-6.x-v1 20.65% <ø> (ø)
elasticsearch-7.x-v1 20.73% <ø> (ø)
elasticsearch-8.x-v1 20.90% <ø> (ø)
elasticsearch-8.x-v2 1.90% <ø> (ø)
grpc_v1 11.39% <ø> (ø)
grpc_v2 1.90% <ø> (ø)
kafka-3.x-v1 10.20% <ø> (ø)
kafka-3.x-v2 1.90% <ø> (ø)
memory_v2 1.90% <ø> (ø)
opensearch-1.x-v1 20.78% <ø> (ø)
opensearch-2.x-v1 20.78% <ø> (ø)
opensearch-2.x-v2 1.90% <ø> (ø)
query 1.90% <ø> (ø)
tailsampling-processor 0.51% <ø> (ø)
unittests 94.99% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yurishkuro yurishkuro added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@sonalgaud12
Copy link
Author

@yurishkuro Hi, all checks have passed. Could you please re-add this to the merge queue or merge it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:test Change that's adding missing tests or correcting existing tests storage/elasticsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ESv1 tests fail with a data race
3 participants