lsds/sgx-lkl

Buffer copying in linux mount code makes assumptions about heap layout

vtikoo opened this issue · 9 comments

In copying mount options here, irrespective of the size of the source buffer, page size bytes are being copied. While this probably silently passes, the code is not accounting for the source buffer being < page size away from the end of enclave mmap region.

One instance of this overflow happening is in mounting overlayfs here. The source buffer here is opts which is 200 bytes long and resides on the stack of the startmain lthread(which is in the enclave mmap region).

The first request to enclave_mmap happens here in the init sequence. The auxiliary vector setup part of __init_libc calls malloc and triggers a enclave_mmap call. The next call is from lthread_create here, for creating the stack of the first startmain lthread. The page assignments are backwards from the end of the enclave mmap region.

This assumption breaks with the relayering work where __init_libc needs to be moved within the startmain thread. A workaround is to always explicity request a buffer page early in the init sequence, to maintain the current assumption.

Repro steps

Replace init_aux function in sgx-lkl-musl/src/env/__libc_start_main.c with the following and run /tests/basic/overlay or /tests/basic/overlay_encrypted. The following code removes malloc usage from init_aux.

char rbuf_static[16];
size_t auxv_static[24] = {0}; 
static size_t *
init_aux(size_t *auxv_base, char *pn)
{
    size_t i, aux_base[AUX_CNT] = {0};
    for (i = 0; auxv_base[i]; i += 2)
        if (auxv_base[i] < AUX_CNT)
            aux_base[auxv_base[i]] = auxv_base[i + 1];

    // By default auxv[AT_RANDOM] points to a buffer with 16 random bytes.
    uint64_t *rbuf;
    size_t *auxv;
    int static_init = 1;
    if (static_init) {
        rbuf = &rbuf_static;
        auxv = &auxv_static;
    }
    else {
        rbuf = malloc(16);
        auxv = malloc(24 * sizeof(*auxv));
        memset(auxv, 0, 24 * sizeof(*auxv));
    }
    // TODO Use intrinsics
    // if (!_rdrand64_step(&rbuf[0]))
    //    goto err;
    register uint64_t rd;
    __asm__ volatile("rdrand %0;"
                     : "=r"(rd));
    rbuf[0] = rd;
    __asm__ volatile("rdrand %0;"
                     : "=r"(rd));
    rbuf[1] = rd;

    auxv[0] = AT_CLKTCK;
    auxv[1] = 100;
    auxv[2] = AT_EXECFN;
    auxv[3] = (size_t) pn;
    auxv[4] = AT_HWCAP;
    auxv[5] = aux_base[AT_HWCAP];
    auxv[6] = AT_EGID;
    auxv[7] = 0;
    auxv[8] = AT_EUID;
    auxv[9] = 0;
    auxv[10] = AT_GID;
    auxv[11] = 0;
    auxv[12] = AT_PAGESZ;
    auxv[13] = aux_base[AT_PAGESZ];
    auxv[14] = AT_PLATFORM;
    auxv[15] = (size_t) "x86_64";
    auxv[16] = AT_SECURE;
    auxv[17] = 0;
    auxv[18] = AT_UID;
    auxv[19] = 0;
    auxv[20] = AT_RANDOM;
    auxv[21] = (size_t)rbuf;
    auxv[22] = AT_NULL;
    auxv[23] = 0;

    return auxv;
}

Error logs:

enclave mmap init, base: 7fe002b13000 end: 7fe040b03000 <----- end of heap
[Switching to Thread 0x7fffee474700 (LWP 28352)]
#1  0x00007fe0005bdc9e in lthread_create (new_lt=0x7fe040f037c8, attrp=0x0, fun=0x7fe0005ab93b <startmain>, arg=0x0)
    at sched/lthread.c:726
726         if ((!lt->attr.stack) && ((intptr_t)(lt->attr.stack = enclave_mmap(
assigned memory: 7fe040a83000 <--- last 512 kbs of enclave heap assigned for startmain stack

[ ... more logs ... ]

[[  SGX-LKL ]] lkl_mount_root_disk(): Creating writable in-memory overlay for rootfs.

Thread 7 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee474700 (LWP 28192)]
exact_copy_from_user (n=<optimized out>, from=<optimized out>, to=<optimized out>) at fs/namespace.c:2972
2972                    *t++ = c;
(gdb) p f <--- f is the pointer iterating over the source buffer. Its gone beyond the end of heap here.
$2 = **0x7fe040b03000** <error: Cannot access memory at address 0x7fe040b03000>
(gdb) bt
#0  exact_copy_from_user (n=<optimized out>, from=<optimized out>, to=<optimized out>) at fs/namespace.c:2972
#1  copy_mount_options (data=0x7fe040b02c70) at fs/namespace.c:3001
#2  0x00007fe00012b4a4 in ksys_mount (
    dev_name=0x7fe03fdf7000 "lowerdir=/mnt/vda,upperdir=/mnt/oda-upper/upper,workdir=/mnt/oda-upper/work", 
    dir_name=0x7fe040b02c70 "lowerdir=/mnt/vda,upperdir=/mnt/oda-upper/upper,workdir=/mnt/oda-upper/work", 
    type=<optimized out>, flags=0, data=<optimized out>) at fs/namespace.c:3315
#3  0x00007fe00012b539 in __do_sys_mount (data=<optimized out>, flags=<optimized out>, type=<optimized out>, 
    dir_name=<optimized out>, dev_name=<optimized out>) at fs/namespace.c:3334
#4  __se_sys_mount (dev_name=<optimized out>, dir_name=<optimized out>, type=<optimized out>, flags=<optimized out>, 
    data=<optimized out>) at fs/namespace.c:3331
#5  0x00007fe00008b85f in run_syscall (params=<optimized out>, no=<optimized out>) at arch/lkl/kernel/syscalls.c:44
#6  lkl_syscall (no=140601101457784, params=0x7fe040b02c70) at arch/lkl/kernel/syscalls.c:192
#7  0x00007fe0005d59d2 in lkl_sys_mount (dev_name=0x7fe0006e0183 "overlay", dir_name=0x7fe040b02df7 "/mnt/oda", 
    type=0x7fe0006e0183 "overlay", flags=0, data=0x7fe040b02c70)
    at /home/vitikoo/sgx-lkl/build_musl/sgx-lkl-musl/include/lkl/asm/syscall_defs.h:545
#8  0x00007fe0005d6433 in lkl_mount_overlayfs (lower_dir=0x7fe040b02e6b "/mnt/vda", 
    upper_dir=0x7fe040b02dd0 "/mnt/oda-upper/upper", work_dir=0x7fe040b02db0 "/mnt/oda-upper/work", 
    mnt_point=0x7fe040b02df7 "/mnt/oda") at lkl/setup.c:345
#9  0x00007fe0005d73e6 in lkl_mount_root_disk (root=0x7fe000b03218, disk_index=0) at lkl/setup.c:804
#10 0x00007fe0005d7636 in lkl_mount_disks (root=0x7fe000b03218, mounts=0x7fe000b03fe0, num_mounts=1, 
    cwd=0x7fe000b03a90 "/") at lkl/setup.c:866
#11 0x00007fe0005ab800 in find_and_mount_disks () at enclave/enclave_init.c:59
#12 0x00007fe0005ab95e in startmain (args=0x0) at enclave/enclave_init.c:108
#13 0x00007fe0005bcadb in _exec (lt_=0x7fe000b032d0) at sched/lthread.c:182
#14 0x0000000000000000 in ?? ()

This sounds like an important issue

prp commented

@vtikoo Good catch! Would pre-allocating a page to maintain the current memory layout be a quick workaround and unblock your work?

@prp Yup that does unblock - vtikoo@1d6f5b8. Are there any other options to fix the root cause?

@bodzhang @paulcallen I'm not sure if this is a p0. To be clear this is not breaking any testcases in oe_port. But there is a hidden assumption in the code that the stack of the first lthread will be placed > 1 page size away from the end of enclave heap. This assumption when not true breaks overlayfs mounting.
This assumption was broken by changes introduced in #403, and there is a workaround for this problem.

There is however the question of whether the code with the buffer overflow can trigger failures in other cases as well. Or if there is a better way to address the root cause of the issue.

prp commented

@prp Yup that does unblock - vtikoo@1d6f5b8. Are there any other options to fix the root cause?

Hmm, isn't the problem here how we set TASK_SIZE? We should ensure that userspace cannot spill out of a safe enclave range. TASK_SIZE is a static setting though: https://github.com/lsds/lkl/blob/9cccce7f63fb1272949707c011e8419b8e0a96af/arch/lkl/include/asm/processor.h#L52

Perhaps we need an invariant where all of our userspace allocations are at least one page away form an unsafe boundary.

Would simply removing a page from the set enclave mmap can return work?

David, am I understanding correctly that your idea is to set mmap_end not to:

mmap_end = (char*)mmap_base + (mmap_num_pages - 1) * PAGE_SIZE

but to

mmap_end = (char*)mmap_base + (mmap_num_pages - 2) * PAGE_SIZE;

is that wat you mean by removing a page from the set?

Yes, if that's the end that's the problem (otherwise, move mmap_base up by one page).

#769 opened.