Skip to content

ThreadsafeFunction in worker_threads cause segfault randomly #58484

Open
@Brooooooklyn

Description

@Brooooooklyn

Version

v22.16.0

Platform

Linux ubuntu-22.04 6.13.7-orbstack-00283-g9d1400e7e9c6 #104 SMP Mon Mar 17 06:15:48 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Steps

  • clone https://github.com/napi-rs/napi-rs
  • checkout 05-25-test_stress_test_on_aarch64_linux_gnu_platform branch
  • Install latest Node.js and Rust
  • yarn install
  • yarn build:test
  • yarn workspace @examples/napi test tests/worker-thread.spec.ts --match '*worker_threads'

Summary

Because NAPI-RS encapsulates too many things, I'll describe as briefly as possible the scenario where I encountered a segfault.

Here is a simple async function in Rust:

#[napi]
pub fn buffer_pass_through(buffer: Buffer) -> Buffer {
  buffer
}

The NAPI-RS would do these under the hood:

  • call napi_create_promise and get defer and promise, promise would return directly
  • call napi_create_threadsafe_function and pass the defer as the ThreadsafeFunction context
  • use tokio::spawn to run the async function, call the ThreadsafeFunction when async function has a value
  • call napi_release_threadsafe_function after the ThreadsafeFunction is called

If this function is not called in worker_threads, there's no problem. I tried writing a loop that calls it hundreds of thousands of times, and didn't find any issues.

However, when this function is called in worker_threads, segfaults occasionally occur, which has been observed both in CI and in feedback from my users.

Backtrace from lldb:

* thread #26, name = 'tokio-runtime-w', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7b07608 libc.so.6`__pthread_kill_implementation(threadid=281472292351904, signo=6, no_tid=<unavailable>) at pthread_kill.c:44:76
    frame #1: 0x0000fffff7abcb3c libc.so.6`__GI_raise(sig=6) at raise.c:26:13
    frame #2: 0x0000fffff7aa7e00 libc.so.6`__GI_abort at abort.c:79:7
    frame #3: 0x0000aaaaad7dfb38 node`uv_mutex_lock(mutex=0x0000fffdf816b038) at thread.c:345:5
    frame #4: 0x0000aaaaabeab7c4 node`node::LibuvMutexTraits::mutex_lock(mutex=0x0000fffdf816b038) at node_mutex.h:183:18
    frame #5: 0x0000aaaaabead48c node`node::MutexBase<node::LibuvMutexTraits>::ScopedLock::ScopedLock(this=0x0000ffff5fffc370, mutex=0x0000fffdf816b038) at node_mutex.h:285:21
    frame #6: 0x0000aaaaac03f4d4 node`v8impl::(anonymous namespace)::ThreadSafeFunction::Release(this=0x0000fffdf816b010, mode=napi_tsfn_release) const at node_api.cc:276:45
    frame #7: 0x0000aaaaac043160 node`napi_release_threadsafe_function(func=0x0000fffdf816b010, mode=napi_tsfn_release) at node_api.cc:1411:70
    frame #8: 0x0000ffffddc45c78 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::call_tsfn::h488d0dbaa4a26cb5(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc490, result=<unavailable>) at deferred.rs:183:7
    frame #9: 0x0000ffffddc44d40 example.linux-arm64-gnu.node`napi::js_values::deferred::JsDeferred$LT$Data$C$Resolver$GT$::resolve::h3d2884560cd31212(self=JsDeferred<napi::js_values::unknown::Unknown, napi::tokio_runtime::execute_tokio_future::{async_block#0}::{closure_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>> @ 0x0000ffff5fffc520, resolver=<unavailable>) at deferred.rs:154:5
    frame #10: 0x0000ffffddb57d00 example.linux-arm64-gnu.node`napi::tokio_runtime::execute_tokio_future::_$u7b$$u7b$closure$u7d$$u7d$::ha7f1ee1bf2582723((null)=0x0000ffff5fffc9b0) at tokio_runtime.rs:233:16
    frame #11: 0x0000ffffdd9158fc example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::_$u7b$$u7b$closure$u7d$$u7d$::hdd5c00f1ec17be65(ptr=0x0000fffdf815abb0) at core.rs:331:17
    frame #12: 0x0000ffffdd8fed1c example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294 [inlined] tokio::loom::std::unsafe_cell::UnsafeCell$LT$T$GT$::with_mut::h21552376c10d6f31(self=0x0000fffdf815abb0, f={closure_env#0}<napi::tokio_runtime::execute_tokio_future::{async_block_env#0}<napi::bindgen_runtime::js_values::arraybuffer::Uint8Array, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{async_block_env#1}, napi_examples::typed_array::_napi_internal_register_array_buffer_pass_through::{closure#0}::{closure_env#2}, napi::error::Error<napi::status::Status>>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> @ 0x0000ffff5fffc968) at unsafe_cell.rs:16:9
    frame #13: 0x0000ffffdd8fed00 example.linux-arm64-gnu.node`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::h2a9ed49810ab1294(self=0x0000fffdf815aba0, cx=<unavailable>) at core.rs:320:13

How often does it reproduce? Is there a required condition?

Repeat 3-5 times and it will appear randomly.

What is the expected behavior? Why is that the expected behavior?

No segfault

What do you see instead?

Segfault

Additional information

Maybe related: #55706

Metadata

Metadata

Assignees

No one assigned

    Labels

    node-apiIssues and PRs related to the Node-API.

    Type

    No type

    Projects

    Status

    Need Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions