Segmentation Fault immediately on require inside Worker threads on Linux
mikkopiu opened this issue ยท 6 comments
Describe the bug
When using Node.js Worker threads, Segmentation fault (core dumped)
/SIGSEGV
is triggered when aws-crt
is imported/loaded, or more specifically: when the native binary is launched.
For me, this first appeared after upgrading a project to AWS SDK JS v3 and a test case run via ava
(using Worker threads) started segfaulting immediately when a module that invoked new FirehoseClient({})
was imported (which in turn, imports/uses aws-crt
).
Expected Behavior
Expected aws-crt
to either throw the exception implemented in #290 or just work when using Worker threads (based on #451 but I might be misunderstanding).
Ideally, I'd be able to run tests using aws-crt
concurrently with ava
(using Worker threads).
Current Behavior
Immediate Segmentation fault (core dumped)
upon require('aws-crt')
(or equivalent).
As I'm not too familiar with debugging C(++), my debugging attempts probably contain a lot of red herrings but here are some of my attempts/findings so far:
-
Using llnode (
lldb
plugin), the backtrace of the minimal repro at least looks weird:$ llnode /usr/bin/node -c /tmp/core.123 (llnode) v8 bt * thread #1: tid = 487, 0x00007f4b814c1450, name = 'node', stop reason = signal SIGSEGV * frame #0: 0x00007f4b814c1450 frame #1: 0x00007f4b8d256df0 libc.so.6`__restore_rt frame #2: 0x00007f4b814c1450 frame #3: 0x00007f4b8d256df0 libc.so.6`__restore_rt ... Repeated >5600 times frame #5691: 0x00007f4b8d256df0 libc.so.6`__restore_rt frame #5692: 0x00007f4b815b7510 frame #5693: 0x00007f4b8d29e931 libc.so.6`__GI___nptl_deallocate_tsd + 161 frame #5694: 0x00007f4b8d2a16d6 libc.so.6`start_thread + 422 frame #5695: 0x00007f4b8d241450 libc.so.6`__clone3 + 48
-
Trying to run the binary directly with
lldb
, crashes withSIGSEGV: address access protected
:$ chmod +x dist/bin/linux-x64/aws-crt-nodejs.node $ lldb dist/bin/linux-x64/aws-crt-nodejs.node (lldb) run Process 4291 launched: '/aws-crt-nodejs/dist/bin/linux-x64/aws-crt-nodejs.node' (x86_64) Process 4291 stopped * thread #1, name = 'aws-crt-nodejs.', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7a8a000) frame #0: 0x00007ffff7a8a000 aws-crt-nodejs.node -> 0x7ffff7a8a000: jg 0x7ffff7a8a047 (lldb) memory read 0x7ffff7a8a000 0x7ffff7a8a000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............ 0x7ffff7a8a010: 03 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>.............
Reproduction Steps
I've been trying to identify the meaningful variables but at least the most reproducible example is (based on #286 (comment)):
-
Start an EC2 instance with AMI
al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64
(latest Amazon Linux 2023 HVM at the time of writing)- or equivalent Linux host, the exact flavour and kernel version don't seem to matter too much (or I might just be really unlucky)
-
On the host, install Node.js:
yum install nodejs
(from the built-in repos, it's18.12.1
at the time of writing) -
Core dumps:
ulimit -c unlimited
-
Create repro files and run:
cd $(mktemp -d) echo '{"name": "repro","type": "module","dependencies": {"aws-crt": "1.15.16"}}' > package.json npm install echo 'import { Worker } from "worker_threads"; const worker = new Worker("./reproWorker.js");' > index.js echo 'import "aws-crt";' > reproWorker.js node index.js # -> Segmentation fault (core dumped)
- In my attempts, reproduces also with all the versions listed below, and if I built
aws-crt
from source and requiredaws-crt-nodejs/dist/index.js
(or thelinux-x64
binary directly in CommonJS)
- In my attempts, reproduces also with all the versions listed below, and if I built
Possible Solution
No response
Additional Information/Context
If I'm not mistaken about aws-crt
being supposed to work under Worker threads, I guess this is actually an upstream Node.js issue but as mentioned, I'm not really familiar enough with C(++) stuff and Worker threads so I haven't been able to confirm.
Here's all the setups I've been able to reproduce this with:
Versions of aws-crt
:
1.15.9
1.15.16
- Local version built from source at commit aafdfee
Node.js:
16.19.1
18.16.0
18.12.1
Operating systems:
- First saw this in a Docker container based on
amazonlinux:2
, running on an Ubuntu-based hostLinux hostname 5.15.0-1033-aws #37~20.04.1-Ubuntu SMP Fri Mar 17 11:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Reproduced in a Debian Bullseye container on an Alpine Linux -based host
Linux hostname 5.15.82-0-virt #1-Alpine SMP Mon, 12 Dec 2022 09:15:17 +0000 x86_64 x86_64 x86_64 GNU/Linux
- Reproduced in an Amazon Linux 2023 container on a Fedora-based host
Linux hostname 6.2.13-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 27 01:33:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- and a matrix of all of the above (images/hosts/kernels)
- Reproduced in an Amazon Linux 2023 VM, to rule out the effects of Docker
Linux hostname 6.1.25-37.47.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Apr 24 23:20:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Does NOT reproduce in macOS 13.3.1 (Ventura); Intel and M1 machines
- and weirdly, the original
ava
setup works with Worker threads enabled if I just use thedarwin-x64
binary on Linux (cp -a node_modules/aws-crt/dist/bin/darwin-x64/aws-crt-nodejs.node node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node
)
- and weirdly, the original
Memory:
- Tested on Docker containers with 4 & 8 GB memory limits
- Tested on VMs with 16 and 32 GB of RAM
Other:
- Not sure of all of the
glibc
etc. versions for all the cases (especially as I'm unfamiliar with C(++) tooling and what exactly would be relevant), but at least for the minimal repro case below, the version is2.34
(from Amazon Linux 2023 repos)
aws-crt-nodejs version used
1.15.16
nodejs version used
18.12.1
Operating System and version
Amazon Linux 2023, AMI: al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64
, uname -a
: Linux hostname 6.1.25-37.47.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Apr 24 23:20:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
This is the thread local storage crash issue mentioned near the bottom of this: aws/aws-iot-device-sdk-js-v2#360
Current plan is to push through the linked s2n patches and switch from aws-lc to openssl's libcrypto which doesn't have a thread local storage destruction problem. I don't have an ETA atm.
We have exactly the same problem currently which is blocking us from upgrading from aws-sdk v2 -> v3. Hopefully we get a fix soon
Same issue reproduced, when running node 17.7 on ARM64 using aws-sdk/client-cognito-identity-provider
package which indeed calls aws-crt
and causes a SIGSEGV.
(I specifically run this on Docker alpine, error: EXITED(139)
).
+1
https://github.com/awslabs/aws-crt-nodejs/releases/tag/v1.15.19 should fix this crash.
We will update the v2 IOT SDK for Javascript shortly. For other dependency updates, please contact the maintainer of the package directly.