-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix TSan warning in sub-interpreter test #5729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
last_istate_.reset(); | ||
internals_tls_p_.reset(); | ||
return; | ||
} | ||
#endif | ||
internals_singleton_pp_ = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking around for a few minutes I'm thinking: This will only ever be reached from unsafe_reset_internals_for_single_interpreter()
in tests/test_embed/test_subinterpreter.cpp. Is that correct (and intentional)?
Would it make sense to leave a small comment to explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also reached when there's a single interpreter, from finalize_interpreter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b-pass Could you please take a look here?
It's a short conversation. It ends with this question:
Did you perhaps mean to write:
last_istate_.reset();
internals_tls_p_.reset();
if (get_num_interpreters_seen() == 1) {
internals_singleton_pp_ = nullptr;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_num_interpreters_seen is not supposed to go down (but it does in the tests, to try not to have that state bleed between tests). Since it never goes down normally, once it increases basically we stop using the singleton_pp and start instead using the two thread locals.
So resetting the singleton_pp pointer shouldn't be necessary once the count has increased (but it was being changed before, for the tests, but created this data race in the tests).
The structure of the code (#if
, code, return, #endif
) mirrors the function above it, and is that way to avoid having an #else
with duplicate code. But it could be changed to maybe read a little more linearly if you want....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm still unclear TBH.
Considering only this specific case:
PYBIND11_HAS_SUBINTERPRETER_SUPPORT
is trueget_num_interpreters_seen() == 1
I believe before this PR this will run:
last_istate_.reset();
internals_tls_p_.reset();
internals_singleton_pp_ = nullptr;
With this PR, only this:
internals_singleton_pp_ = nullptr;
I.e. the two .reset()
are skipped with this PR. Is that a correct understanding?
Could you please confirm that skipping the two .reset()
is intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping the reset was intentional, as those are not used/touched until the count is greater than 1. But you're also correct, the real purpose of the PR was to avoid changing internals_singleton_pp_
when the count is above 1, and, so skipping the resets was not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
Description
I ran test_embed (including the sub-interpreter tests) with
-fsanitize=thread
. ThreadSanitizer complains about the internals singleton pointer being changed (null'd) from multiple different threads during sub-interpreter destruction.I was hoping to find a cause for sporadic failures of the sub-interpreter test in
ubuntu-latest, 3.12, -DPYBIND11_TEST_SMART_HOLDER=ON -DPYBIND11_SIMPLE_GIL_MANAGEMENT=ON
.I am not sure if this is the issue, I was unable to reproduce the test failure locally.
The TSan output before this patch: