lsds/sgx-lkl

Segfault when using pthread_cancel

SeanTAllen opened this issue · 4 comments

When running the following test:

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        sleep(1);
        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

Both myself and @vtikoo get a segfault fairly early on.

Backtrace follows:

Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe03dadfb10 (LWP 17645)]
0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
895     {
(gdb) bt
#0  0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
#1  0x00007fe00009a2eb in __send_signal (force=<optimized out>, type=<optimized out>, t=<optimized out>, info=<optimized out>,
    sig=<optimized out>) at kernel/signal.c:1076
#2  send_signal (sig=33, info=0x7fe03dabf0d0, t=0x0, type=PIDTYPE_PID) at kernel/signal.c:1236
#3  0x00007fe00009b6cd in do_send_sig_info (sig=1072106240, info=0x21, p=0x7fe03dabf0d0, type=PIDTYPE_PID) at kernel/signal.c:1285
#4  0x00007fe00009b763 in do_send_specific (tgid=33, pid=<optimized out>, sig=1072106240, info=0x0) at kernel/signal.c:3772
#5  0x00007fe00009b816 in do_tkill (tgid=33, pid=0, sig=1034678480) at kernel/signal.c:3798
#6  0x00007fe00009c504 in __do_sys_tkill (sig=<optimized out>, pid=<optimized out>) at kernel/signal.c:3833
#7  __se_sys_tkill (pid=<optimized out>, sig=<optimized out>) at kernel/signal.c:3827
#8  0x00007fe03dabf180 in ?? ()
#9  0x00007fe00008b6cf in run_syscall (params=<optimized out>, no=<optimized out>) at arch/lkl/kernel/syscalls.c:44
#10 lkl_syscall (no=0, params=0x7fe03dabf0d0) at arch/lkl/kernel/syscalls.c:192

Given the backtrace, this might be connected to our various signal problems. However, that isn't known yet so I wanted to open this issue to keep track of this problem.


A slight variation on this is that with the following version, the application will eventually hang (for me, always after the creation of 255 threads):

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

note the only difference from the first one is the lack of the sleep(1) call in main.

prp commented

This will be broken because we are not delivering signals to the correct thread (see #644).

@vtikoo, I think the syscall_cp assembly was not yet ported over to LKL, which may be related here?

@davidchisnall I tried adding a breakpoint at the entry of __syscall_cp.s, it doesn't look syscall_cp is called during this test.
The tkill syscall mentioned in the stacktrace is most probably from cancel_handler - https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/thread/pthread_cancel.c#L67

@KenGordon is working on fixing signal delivery to the correct thread.