Open
Description
Version
v22.16.0
Platform
Linux ubuntu-22.04 6.13.7-orbstack-00283-g9d1400e7e9c6 #104 SMP Mon Mar 17 06:15:48 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
Steps
- clone https://github.com/napi-rs/napi-rs
- checkout
05-25-test_stress_test_on_aarch64_linux_gnu_platform
branch - Install latest Node.js and Rust
- yarn install
- yarn build:test
- yarn workspace @examples/napi test tests/worker-thread.spec.ts --match '*worker_threads'
Summary
Because NAPI-RS encapsulates too many things, I'll describe as briefly as possible the scenario where I encountered a segfault.
Here is a simple async
function in Rust:
#[napi]
pub fn buffer_pass_through(buffer: Buffer) -> Buffer {
buffer
}
The NAPI-RS would do these under the hood:
- call
napi_create_promise
and getdefer
andpromise
,promise
would return directly - call
napi_create_threadsafe_function
and pass thedefer
as the ThreadsafeFunctioncontext
- use
tokio::spawn
to run the async function, call theThreadsafeFunction
when async function has a value - call
napi_release_threadsafe_function
after theThreadsafeFunction
is called
If this function is not called in worker_threads, there's no problem. I tried writing a loop that calls it hundreds of thousands of times, and didn't find any issues.
However, when this function is called in worker_threads, segfaults occasionally occur, which has been observed both in CI and in feedback from my users.
Backtrace from lldb:
* thread #26, name = 'tokio-runtime-w', stop reason = signal SIGABRT
* frame #0: 0x0000fffff7b07608 libc.so.6`__pthread_kill_implementation(threadid=281472292351904, signo=6, no_tid=<unavailable>) at pthread_kill.c:44:76
frame #1: 0x0000fffff7abcb3c libc.so.6`__GI_raise(sig=6) at raise.c:26:13
frame #2: 0x0000fffff7aa7e00 libc.so.6`__GI_abort at abort.c:79:7
frame #3: 0x0000aaaaad7dfb38 node`uv_mutex_lock(mutex=0x0000fffdf816b038) at thread.c:345:5
frame #4: 0x0000aaaaabeab7c4 node`node::LibuvMutexTraits::mutex_lock(mutex=0x0000fffdf816b038) at node_mutex.h:183:18
frame #5: 0x0000aaaaabead48c node`node::MutexBase<node::LibuvMutexTraits>::ScopedLock::ScopedLock(this=0x0000ffff5fffc370, mutex=0x0000fffdf816b038) at node_mutex.h:285:21
frame #6: 0x0000aaaaac03f4d4 node`v8impl::(anonymous namespace)::ThreadSafeFunction::Release(this=0x0000fffdf816b010, mode=napi_tsfn_release) const at node_api.cc:276:45
frame #7: 0x0000aaaaac043160 node`napi_release_threadsafe_function(func=0x0000fffdf816b010, mode=napi_tsfn_release) at node_api.cc:1411:70
frame #8: 0x0000ffffddc45c78 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::call_tsfn::h488d0dbaa4a26cb5(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc490, result=<unavailable>) at deferred.rs:183:7
frame #9: 0x0000ffffddc44d40 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::resolve::h3d2884560cd31212(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc520, resolver=<unavailable>) at deferred.rs:154:5
frame #10: 0x0000ffffddb57d00 example.linux-arm64-gnu.node`napi::tokio_runtime::execute_tokio_future::_$u7b$$u7b$closure$u7d$$u7d$::ha7f1ee1bf2582723((null)=0x0000ffff5fffc9b0) at tokio_runtime.rs:233:16
frame #11: 0x0000ffffdd9158fc example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::_$u7b$$u7b$closure$u7d$$u7d$::hdd5c00f1ec17be65(ptr=0x0000fffdf815abb0) at core.rs:331:17
frame #12: 0x0000ffffdd8fed1c example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294 [inlined] tokio::loom::std::unsafe_cell::UnsafeCell$LT$T$GT$::with_mut::h21552376c10d6f31(self=0x0000fffdf815abb0, f={closure_env#0}<napi::tokio_runtime::execute_tokio_future::{async_block_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> @ 0x0000ffff5fffc968) at unsafe_cell.rs:16:9
frame #13: 0x0000ffffdd8fed00 example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294(self=0x0000fffdf815aba0, cx=<unavailable>) at core.rs:320:13
How often does it reproduce? Is there a required condition?
Repeat 3-5 times and it will appear randomly.
What is the expected behavior? Why is that the expected behavior?
No segfault
What do you see instead?
Segfault
Additional information
Maybe related: #55706
Metadata
Metadata
Assignees
Type
Projects
Status
Need Triage