Segfault when using pthread_cancel
SeanTAllen opened this issue · 4 comments
When running the following test:
/*
* pthread_cancel-test.c
*
* This simple test checks that thread creation and cancelling.
*
*/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#define RUNS 10000
void* thread_worker(void* arg)
{
while(1) {
printf("i'm in a loop\n");
sleep(1);
}
}
int main(void)
{
int i;
pthread_t thread1;
int ret;
for (i = 0; i < RUNS; i++)
{
printf("Creating worker thread (run=%d)\n", i);
ret = pthread_create(&thread1, NULL, thread_worker, NULL);
printf("Created worker thread (run=%d)\n", i);
if (ret != 0)
{
printf("Failed to create thread (ret=%i)\n", ret);
printf("TEST FAILED\n");
exit(-1);
}
sleep(1);
printf("Cancelling worker thread (run=%d)\n", i);
pthread_cancel(thread1);
printf("Cancelled worker thread (run=%d)\n", i);
}
if (i == RUNS)
{
printf("TEST PASSED (pthread_join) runs=%i\n", i);
}
else
{
printf("Wrong number of runs\n");
printf("TEST FAILED\n");
}
return 0;
}
Both myself and @vtikoo get a segfault fairly early on.
Backtrace follows:
Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe03dadfb10 (LWP 17645)]
0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
895 {
(gdb) bt
#0 0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
#1 0x00007fe00009a2eb in __send_signal (force=<optimized out>, type=<optimized out>, t=<optimized out>, info=<optimized out>,
sig=<optimized out>) at kernel/signal.c:1076
#2 send_signal (sig=33, info=0x7fe03dabf0d0, t=0x0, type=PIDTYPE_PID) at kernel/signal.c:1236
#3 0x00007fe00009b6cd in do_send_sig_info (sig=1072106240, info=0x21, p=0x7fe03dabf0d0, type=PIDTYPE_PID) at kernel/signal.c:1285
#4 0x00007fe00009b763 in do_send_specific (tgid=33, pid=<optimized out>, sig=1072106240, info=0x0) at kernel/signal.c:3772
#5 0x00007fe00009b816 in do_tkill (tgid=33, pid=0, sig=1034678480) at kernel/signal.c:3798
#6 0x00007fe00009c504 in __do_sys_tkill (sig=<optimized out>, pid=<optimized out>) at kernel/signal.c:3833
#7 __se_sys_tkill (pid=<optimized out>, sig=<optimized out>) at kernel/signal.c:3827
#8 0x00007fe03dabf180 in ?? ()
#9 0x00007fe00008b6cf in run_syscall (params=<optimized out>, no=<optimized out>) at arch/lkl/kernel/syscalls.c:44
#10 lkl_syscall (no=0, params=0x7fe03dabf0d0) at arch/lkl/kernel/syscalls.c:192
Given the backtrace, this might be connected to our various signal problems. However, that isn't known yet so I wanted to open this issue to keep track of this problem.
A slight variation on this is that with the following version, the application will eventually hang (for me, always after the creation of 255 threads):
/*
* pthread_cancel-test.c
*
* This simple test checks that thread creation and cancelling.
*
*/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#define RUNS 10000
void* thread_worker(void* arg)
{
while(1) {
printf("i'm in a loop\n");
sleep(1);
}
}
int main(void)
{
int i;
pthread_t thread1;
int ret;
for (i = 0; i < RUNS; i++)
{
printf("Creating worker thread (run=%d)\n", i);
ret = pthread_create(&thread1, NULL, thread_worker, NULL);
printf("Created worker thread (run=%d)\n", i);
if (ret != 0)
{
printf("Failed to create thread (ret=%i)\n", ret);
printf("TEST FAILED\n");
exit(-1);
}
printf("Cancelling worker thread (run=%d)\n", i);
pthread_cancel(thread1);
printf("Cancelled worker thread (run=%d)\n", i);
}
if (i == RUNS)
{
printf("TEST PASSED (pthread_join) runs=%i\n", i);
}
else
{
printf("Wrong number of runs\n");
printf("TEST FAILED\n");
}
return 0;
}
note the only difference from the first one is the lack of the sleep(1)
call in main
.
This will be broken because we are not delivering signals to the correct thread (see #644).
@vtikoo, I think the syscall_cp
assembly was not yet ported over to LKL, which may be related here?
@davidchisnall I tried adding a breakpoint at the entry of __syscall_cp.s
, it doesn't look syscall_cp is called during this test.
The tkill syscall mentioned in the stacktrace is most probably from cancel_handler
- https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/thread/pthread_cancel.c#L67
@KenGordon is working on fixing signal delivery to the correct thread.