Fix TSan warning in sub-interpreter test #5729

b-pass · 2025-06-16T02:51:51Z

Description

I ran test_embed (including the sub-interpreter tests) with -fsanitize=thread. ThreadSanitizer complains about the internals singleton pointer being changed (null'd) from multiple different threads during sub-interpreter destruction.

I was hoping to find a cause for sporadic failures of the sub-interpreter test in ubuntu-latest, 3.12, -DPYBIND11_TEST_SMART_HOLDER=ON -DPYBIND11_SIMPLE_GIL_MANAGEMENT=ON.

I am not sure if this is the issue, I was unable to reproduce the test failure locally.

The TSan output before this patch:

==================
WARNING: ThreadSanitizer: data race (pid=331420)
  Write of size 8 at 0x7fc9bb875e00 by thread T14:
    #0 pybind11::detail::internals_pp_manager<pybind11::detail::internals>::unref() /home/user/pybind11/include/pybind11/detail/internals.h:519 (external_module.cpython-312-x86_64-linux-gnu.so+0x563c0) (BuildId: 696a38b51e55c8831621b2d0f13d7773a1506a70)
    #1 PyInit_external_module /home/user/pybind11/tests/test_embed/external_module.cpp:9 (external_module.cpython-312-x86_64-linux-gnu.so+0x25592) (BuildId: 696a38b51e55c8831621b2d0f13d7773a1506a70)
    #2 _PyImport_LoadDynamicModuleWithSpec ../Python/importdl.c:169 (libpython3.12.so.1.0+0x2d9a9e) (BuildId: 5c546cb03f97d86afd10e4288ac3b79cdeba1951)
    #3 operator() /home/user/pybind11/tests/test_embed/test_subinterpreter.cpp:358 (test_embed+0x1d46be) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #4 __invoke_impl<void, C_A_T_C_H_T_E_S_T_6()::<lambda(int)>, int> /usr/include/c++/13/bits/invoke.h:61 (test_embed+0x1d7bab) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #5 __invoke<C_A_T_C_H_T_E_S_T_6()::<lambda(int)>, int> /usr/include/c++/13/bits/invoke.h:96 (test_embed+0x1d79b2) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #6 _M_invoke<0, 1> /usr/include/c++/13/bits/std_thread.h:292 (test_embed+0x1d780a) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #7 operator() /usr/include/c++/13/bits/std_thread.h:299 (test_embed+0x1d7702) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #8 _M_run /usr/include/c++/13/bits/std_thread.h:244 (test_embed+0x1d7614) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #9 <null> <null> (libstdc++.so.6+0xecdb3) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)

  Previous write of size 8 at 0x7fc9bb875e00 by thread T13:
    #0 pybind11::detail::internals_pp_manager<pybind11::detail::internals>::unref() /home/user/pybind11/include/pybind11/detail/internals.h:519 (external_module.cpython-312-x86_64-linux-gnu.so+0x563c0) (BuildId: 696a38b51e55c8831621b2d0f13d7773a1506a70)
    #1 PyInit_external_module /home/user/pybind11/tests/test_embed/external_module.cpp:9 (external_module.cpython-312-x86_64-linux-gnu.so+0x25592) (BuildId: 696a38b51e55c8831621b2d0f13d7773a1506a70)
    #2 _PyImport_LoadDynamicModuleWithSpec ../Python/importdl.c:169 (libpython3.12.so.1.0+0x2d9a9e) (BuildId: 5c546cb03f97d86afd10e4288ac3b79cdeba1951)
    #3 operator() /home/user/pybind11/tests/test_embed/test_subinterpreter.cpp:358 (test_embed+0x1d46be) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #4 __invoke_impl<void, C_A_T_C_H_T_E_S_T_6()::<lambda(int)>, int> /usr/include/c++/13/bits/invoke.h:61 (test_embed+0x1d7bab) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #5 __invoke<C_A_T_C_H_T_E_S_T_6()::<lambda(int)>, int> /usr/include/c++/13/bits/invoke.h:96 (test_embed+0x1d79b2) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #6 _M_invoke<0, 1> /usr/include/c++/13/bits/std_thread.h:292 (test_embed+0x1d780a) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #7 operator() /usr/include/c++/13/bits/std_thread.h:299 (test_embed+0x1d7702) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #8 _M_run /usr/include/c++/13/bits/std_thread.h:244 (test_embed+0x1d7614) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #9 <null> <null> (libstdc++.so.6+0xecdb3) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)

  Location is global 'pybind11::detail::get_internals_pp_manager()::internals_pp_manager' of size 40 at 0x7fc9bb875de0 (external_module.cpython-312-x86_64-linux-gnu.so+0xade00)

  Thread T14 (tid=331436, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) (BuildId: 38097064631f7912bd33117a9c83d08b42e15571)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xeceb0) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)
    #2 C_A_T_C_H_T_E_S_T_6 /home/user/pybind11/tests/test_embed/test_subinterpreter.cpp:382 (test_embed+0x1d50db) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #3 Catch::TestInvokerAsFunction::invoke() const /home/user/pybind11/build12/tests/catch/catch.hpp:14330 (test_embed+0x3cdde) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #4 Catch::TestCase::invoke() const /home/user/pybind11/build12/tests/catch/catch.hpp:14169 (test_embed+0x3bb96) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #5 Catch::RunContext::invokeActiveTestCase() /home/user/pybind11/build12/tests/catch/catch.hpp:13025 (test_embed+0x3423b) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #6 Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/user/pybind11/build12/tests/catch/catch.hpp:12998 (test_embed+0x33e68) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #7 Catch::RunContext::runTest(Catch::TestCase const&) /home/user/pybind11/build12/tests/catch/catch.hpp:12759 (test_embed+0x31ee5) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #8 execute /home/user/pybind11/build12/tests/catch/catch.hpp:13352 (test_embed+0x363da) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #9 Catch::Session::runInternal() /home/user/pybind11/build12/tests/catch/catch.hpp:13562 (test_embed+0x37d59) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #10 Catch::Session::run() /home/user/pybind11/build12/tests/catch/catch.hpp:13518 (test_embed+0x378ee) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #11 int Catch::Session::run<char>(int, char const* const*) <null> (test_embed+0x9ecd1) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #12 main /home/user/pybind11/tests/test_embed/catch.cpp:40 (test_embed+0x55b35) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)

  Thread T13 (tid=331435, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) (BuildId: 38097064631f7912bd33117a9c83d08b42e15571)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xeceb0) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)
    #2 C_A_T_C_H_T_E_S_T_6 /home/user/pybind11/tests/test_embed/test_subinterpreter.cpp:381 (test_embed+0x1d50a2) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #3 Catch::TestInvokerAsFunction::invoke() const /home/user/pybind11/build12/tests/catch/catch.hpp:14330 (test_embed+0x3cdde) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #4 Catch::TestCase::invoke() const /home/user/pybind11/build12/tests/catch/catch.hpp:14169 (test_embed+0x3bb96) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #5 Catch::RunContext::invokeActiveTestCase() /home/user/pybind11/build12/tests/catch/catch.hpp:13025 (test_embed+0x3423b) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #6 Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/user/pybind11/build12/tests/catch/catch.hpp:12998 (test_embed+0x33e68) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #7 Catch::RunContext::runTest(Catch::TestCase const&) /home/user/pybind11/build12/tests/catch/catch.hpp:12759 (test_embed+0x31ee5) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #8 execute /home/user/pybind11/build12/tests/catch/catch.hpp:13352 (test_embed+0x363da) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #9 Catch::Session::runInternal() /home/user/pybind11/build12/tests/catch/catch.hpp:13562 (test_embed+0x37d59) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #10 Catch::Session::run() /home/user/pybind11/build12/tests/catch/catch.hpp:13518 (test_embed+0x378ee) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #11 int Catch::Session::run<char>(int, char const* const*) <null> (test_embed+0x9ecd1) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)
    #12 main /home/user/pybind11/tests/test_embed/catch.cpp:40 (test_embed+0x55b35) (BuildId: 000fdc14d7f2b4671dae95570541b4d13f7c5059)

SUMMARY: ThreadSanitizer: data race /home/user/pybind11/include/pybind11/detail/internals.h:519 in pybind11::detail::internals_pp_manager<pybind11::detail::internals>::unref()
==================

rwgk · 2025-06-16T04:21:44Z

include/pybind11/detail/internals.h

+            last_istate_.reset();
+            internals_tls_p_.reset();
+            return;
+        }
 #endif
        internals_singleton_pp_ = nullptr;


After looking around for a few minutes I'm thinking: This will only ever be reached from unsafe_reset_internals_for_single_interpreter() in tests/test_embed/test_subinterpreter.cpp. Is that correct (and intentional)?

Would it make sense to leave a small comment to explain?

It's also reached when there's a single interpreter, from finalize_interpreter.

@b-pass Could you please take a look here?

https://chatgpt.com/share/6850e8ca-bfc8-8008-87f4-252326333317

It's a short conversation. It ends with this question:

Did you perhaps mean to write:

last_istate_.reset(); internals_tls_p_.reset(); if (get_num_interpreters_seen() == 1) { internals_singleton_pp_ = nullptr; }

get_num_interpreters_seen is not supposed to go down (but it does in the tests, to try not to have that state bleed between tests). Since it never goes down normally, once it increases basically we stop using the singleton_pp and start instead using the two thread locals.

So resetting the singleton_pp pointer shouldn't be necessary once the count has increased (but it was being changed before, for the tests, but created this data race in the tests).

The structure of the code (#if, code, return, #endif) mirrors the function above it, and is that way to avoid having an #else with duplicate code. But it could be changed to maybe read a little more linearly if you want....

Sorry I'm still unclear TBH.

Considering only this specific case:

PYBIND11_HAS_SUBINTERPRETER_SUPPORT is true

get_num_interpreters_seen() == 1

I believe before this PR this will run:

last_istate_.reset(); internals_tls_p_.reset(); internals_singleton_pp_ = nullptr;

With this PR, only this:

internals_singleton_pp_ = nullptr;

I.e. the two .reset() are skipped with this PR. Is that a correct understanding?

Could you please confirm that skipping the two .reset() is intentional?

Skipping the reset was intentional, as those are not used/touched until the count is greater than 1. But you're also correct, the real purpose of the PR was to avoid changing internals_singleton_pp_ when the count is above 1, and, so skipping the resets was not necessary.

Got it, thanks!

Fix TSan warning in sub-interpreter test

3b9ffd9

b-pass mentioned this pull request Jun 16, 2025

feat: support for sub-interpreters #5564

Merged

rwgk reviewed Jun 16, 2025

View reviewed changes

henryiii approved these changes Jun 17, 2025

View reviewed changes

rwgk approved these changes Jun 18, 2025

View reviewed changes

rwgk merged commit f2c0ab8 into pybind:master Jun 18, 2025
82 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Jun 18, 2025

henryiii removed the needs changelog Possibly needs a changelog entry label Jun 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TSan warning in sub-interpreter test #5729

Fix TSan warning in sub-interpreter test #5729

Uh oh!

b-pass commented Jun 16, 2025

Uh oh!

rwgk Jun 16, 2025

Uh oh!

b-pass Jun 16, 2025

Uh oh!

rwgk Jun 17, 2025

Uh oh!

b-pass Jun 17, 2025

Uh oh!

rwgk Jun 17, 2025

Uh oh!

b-pass Jun 18, 2025 •

edited

Loading

Uh oh!

rwgk Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Fix TSan warning in sub-interpreter test #5729

Fix TSan warning in sub-interpreter test #5729

Uh oh!

Conversation

b-pass commented Jun 16, 2025

Description

Uh oh!

rwgk Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

b-pass Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

b-pass Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

b-pass Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

b-pass Jun 18, 2025 •

edited

Loading