Node crash with tracing enabled

### Version

v18.13.0

### Platform

Linux devloop-Surface-Laptop-4 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

### Subsystem

_No response_

### What steps will reproduce the bug?

This was originally encountered when trying to capture a trace of the TypeScript compiler running, however I can also reliably reproduce the same crash with a much simpler invocation that doesn't require setting up/installing a complex project.

```sh
$ node --trace-events-enabled /usr/lib/node_modules/npm/bin/npm-cli.js set init-license MIT
Aborted (core dumped)
```

### How often does it reproduce? Is there a required condition?

See "additional information" below for a possible condition. From my experience, the commands that cause a crash - like the one from above - reproduce 100% of the time. That said, not _all_ invocations with tracing enabled will reproduce the issue.

### What is the expected behavior?

The node process does not crash and the trace file is written to disk in its entirety

### What do you see instead?

```sh
Aborted (core dumped)
```

On top of that, the trace file contains incomplete JSON

### Additional information

Running under the debugger, I see a crash (`abort`) with this stack:

```
Thread 2 (Thread 0x7ffff7a42640 (LWP 2148334) "node"):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348118080) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348118080) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348118080, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7a88476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7a6e7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000000000a9d38a in uv_mutex_lock (mutex=mutex@entry=0x5278de0 <mutex>) at ../deps/uv/src/unix/thread.c:352
#6  0x000000000164dfe8 in post (kind=UV__WORK_FAST_IO, q=0x5328190) at ../deps/uv/src/threadpool.c:143
#7  uv__work_submit (loop=loop@entry=0x5415bd8, w=w@entry=0x5328178, kind=kind@entry=UV__WORK_FAST_IO, work=work@entry=0x1656270 <uv__fs_work>, done=done@entry=0x1655b60 <uv__fs_done>) at ../deps/uv/src/threadpool.c:270
#8  0x000000000165953d in uv_fs_write (loop=0x5415bd8, req=0x5328028, file=<optimized out>, bufs=0x7ffff7a3e720, nbufs=<optimized out>, off=-1, cb=0xc685f0 <node::tracing::NodeTraceWriter::StartWrite(uv_buf_t)::{lambda(uv_fs_s*)#1}::_FUN(uv_fs_s*)>) at ../deps/uv/src/unix/fs.c:2191
#9  0x0000000000c687ce in node::tracing::NodeTraceWriter::WriteToFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, int) [clone .part.70] ()
#10 0x0000000000c68e2a in node::tracing::NodeTraceWriter::FlushPrivate() ()
#11 0x0000000001652906 in uv__async_io (loop=0x5415bd8, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:163
#12 0x0000000001664e44 in uv__io_poll (loop=loop@entry=0x5415bd8, timeout=<optimized out>) at ../deps/uv/src/unix/epoll.c:374
#13 0x000000000165326e in uv_run (loop=0x5415bd8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:406
#14 0x00007ffff7adab43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007ffff7b6ca00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
```

The code is calling `abort` because acquiring a mutex is failing

```c++
void uv_mutex_lock(uv_mutex_t* mutex) {
  if (pthread_mutex_lock(mutex))
    abort();
}
```

If I look at other threads, the main thread is executing here:

```
Thread 1 (Thread 0x7ffff7a437c0 (LWP 2148331) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5327fe8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x5327fe8) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5327fe8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff7ad9ac1 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5327f98, cond=0x5327fc0) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x5327fc0, mutex=0x5327f98) at ./nptl/pthread_cond_wait.c:627
#5  0x0000000001661879 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:800
#6  0x0000000000c67993 in node::tracing::NodeTraceWriter::Flush(bool) ()
#7  0x0000000000c6422d in node::tracing::Agent::Flush(bool) ()
#8  0x0000000000c66c2b in node::tracing::NodeTraceBuffer::Flush() ()
#9  0x0000000001ad3222 in v8::platform::tracing::TracingController::StopTracing() ()
#10 0x0000000000c6504a in node::tracing::Agent::Disconnect(int) ()
#11 0x0000000000ab39f3 in node::DefaultProcessExitHandler(node::Environment*, int) ()
#12 0x0000000000af5083 in node::Environment::Exit(int) ()
#13 0x0000000000db0230 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#14 0x0000000000db176f in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#15 0x00000000016ef579 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#16 0x00000000016734d0 in Builtins_InterpreterEntryTrampoline ()
#17 0x00002ec5934c15b9 in ?? ()
#18 0x000006fbf30c60e9 in ?? ()
#19 0x0000000600000000 in ?? ()
#20 0x00002ec5934c1689 in ?? ()
#21 0x000006fbf30c27d9 in ?? ()
#22 0x0000000000000000 in ?? ()
```

My non-expert guess is that the JS code calls `process.exit` and the tear-down of state leads to a race condition where the mutex is trying to be used after it has been destroyed. That said, trying to repro this with a simple example that just calls `process.exit(0)` does not produce the same crash, so I'm guessing that there's at least some amount of unpredictability in the repro (or my assessment could be wrong).

I suppose it's also worth mentioning that I haven't observed this failure on Windows, but that's perhaps because `EnterCriticalSection` has `void` return type, along with the fact that calling `EnterCriticalSection` after `DeleteCriticalSection` doesn't _necessarily_ cause problems because a lot of the necessary state "lives" inside of the `CRITICAL_SECTION` object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Node crash with tracing enabled #46376

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Node crash with tracing enabled #46376

Description

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions