mit-pdos/ward

Try to get Git to run on Ward

celskeggs opened this issue · 6 comments

For the sake of being an appropriate macrobenchmark.

  • Do we have enough binary compatibility to run a statically-linked Git executable directly?
  • If not, can we get it by rebuilding Git with a simpler libc?
  • How much of the libc does Git actually need? Will it compile with what we have?
  • If it won't, what would it take to port a small libc like ulibc?

Unsurprisingly, statically-linked Git compiled with glibc on the host crashes while trying to access the auxiliary vectors during process startup, in this code from csu/libc-start.c:154...168:

# ifdef HAVE_AUX_VECTOR
  /* First process the auxiliary vector since we need to find the
     program header to locate an eventually present PT_TLS entry.  */
#  ifndef LIBC_START_MAIN_AUXVEC_ARG
  ElfW(auxv_t) *auxvec;
  {
    char **evp = ev;
    while (*evp++ != NULL)
      ;
    auxvec = (ElfW(auxv_t) *) evp;
  }
#  endif
  _dl_aux_init (auxvec);
  if (GL(dl_phdr) == NULL)
# endif

My next step will be to see if we can stub out the auxiliary vectors and make this not crash.

Added null auxiliary vectors (1e1968a), and got this:

$ git
37 git: unknown sys call 107
37 git: unknown sys call 102
37 git: unknown sys call 108
37 git: unknown sys call 104
37 git: unknown sys call 12
pagefault from user for 0x4000000001c0 err 6
pid 37 git: trap 14 err 6 on cpu 0 rip 0x78269c rsp 0x7ffffffffe38 addr 0xffffffffc24046a8--kill proc

The pagefault is in __brk:

0000000000782660 <__brk>:
  782660:       b9 0c 00 00 00          mov    $0xc,%ecx
  782665:       89 c8                   mov    %ecx,%eax
  782667:       0f 05                   syscall 
  782669:       48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
  78266f:       48 89 c2                mov    %rax,%rdx
  782672:       77 14                   ja     782688 <__brk+0x28>
  782674:       48 89 05 d5 67 36 00    mov    %rax,0x3667d5(%rip)        # ae8e50 <__curbrk>
  78267b:       31 c0                   xor    %eax,%eax
  78267d:       48 39 d7                cmp    %rdx,%rdi
  782680:       77 26                   ja     7826a8 <__brk+0x48>
  782682:       f3 c3                   repz retq 
  782684:       0f 1f 40 00             nopl   0x0(%rax)
  782688:       48 c7 c0 b0 ff ff ff    mov    $0xffffffffffffffb0,%rax
  78268f:       f7 da                   neg    %edx
  782691:       48 c7 05 b4 67 36 00    movq   $0xffffffffffffffff,0x3667b4(%rip)        # ae8e50 <__curbrk>
  782698:       ff ff ff ff 
**78269c:       64 89 10                mov    %edx,%fs:(%rax)  <---------------
  78269f:       31 c0                   xor    %eax,%eax
  7826a1:       c3                      retq
  7826a2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  7826a8:       48 c7 c0 b0 ff ff ff    mov    $0xffffffffffffffb0,%rax
  7826af:       64 c7 00 0c 00 00 00    movl   $0xc,%fs:(%rax)
  7826b6:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  7826bb:       c3                      retq
  7826bc:       0f 1f 40 00             nopl   0x0(%rax)

And the syscalls that failed to be run were, in order:

  • geteuid
  • getuid
  • getegid
  • getgid
  • brk

So it seems likely that the missing brk syscall is the source of the problem:

We do actually implement the sbrk syscall, but to quote the manpage for sbrk:

On Linux, sbrk() is implemented as a library function that uses the brk() system call, and does some internal bookkeeping so that it can return the old break value.

So I'll look into whether we can do the same thing and implement sbrk as a library function over a brk syscall, or just add a second syscall.

brk support in 97ba5e6 gives us:

36 git: unknown sys call 107
36 git: unknown sys call 102
36 git: unknown sys call 108
36 git: unknown sys call 104
pagefault from user for 0xaaef80 err 7
pid 36 git: trap 14 err 7 on cpu 0 rip 0x6be64b rsp 0x7ffffffffe60 addr 0xffffffffc24046a8--kill proc

Better. Looks like the new crash is in __libc_setup_tls, so I'll look into that later.

It might be worth using the table from tools/syscall.py to print the missing syscalls by name instead of number. That script already generates an array of names for the implemented Ward syscalls, so it shouldn't be a big change

syscalls named in 856ed6b:

36 git: unknown sys call geteuid [107]
36 git: unknown sys call getuid [102]
36 git: unknown sys call getegid [108]
36 git: unknown sys call getgid [104]
pagefault from user for 0xaaef80 err 7
pid 36 git: trap 14 err 7 on cpu 0 rip 0x6be64b rsp 0x7ffffffffe60 addr 0xffffffffc24046a8--kill proc

Fault seems to occur in __libc_setup_tls:

struct link_map *main_map = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
[...]
main_map->l_tls_offset = roundup (memsz, align ?: 1);

The instruction that stores l_tls_offset faults:

6be64b:       49 89 af 40 04 00 00    mov    %rbp,0x440(%r15)

This is because _dl_main_map is mapped as read-only, so __libc_setup_tls's attempt to modify it fails. I'm not yet sure why this is mapped as read-only, because it's not mapped as read-only when this runs under linux. It's stored in the .data section, which is marked writable and loaded via a program header mapped RW. So perhaps we have broken ELF-loading code?

EDIT: ELF-loading code looks okay; need to check whether page tables are correct at _start, and if so, where it gets changed.

Something deeper is going on here with the page-mapping troubles:

(log edited to remove irrelevant debug messages for clarity)

$ git
36 git: unknown sys call geteuid [107]
36 git: unknown sys call getuid [102]
36 git: unknown sys call getegid [108]
36 git: unknown sys call getgid [104]
pagefault from user for 0xaaef80 err 7
pid 36 git: trap 14 err 7 on cpu 0 rip 0x6be64b rsp 0x7ffffffffe60 addr 0xffffffffc04046a8--kill proc
$ git
exec failed
exec git failed
$ git
pagefault from user for 0 err 20
pid 38 sh: trap 14 err 20 on cpu 0 rip 0 rsp 0x7ffffffffe00 addr 0xffffffffc04046a8--kill proc
$ git
exec failed
exec git failed
$ git
exec failed
exec git failed
pagefault from user for 0x10 err 4
pid 35 sh: trap 14 err 4 on cpu 0 rip 0x404296 rsp 0x7ffffffffe80 addr 0xffffffffc04046a8--kill proc
$ git
exec failed
exec git failed
$

Notably, no matter how weird glibc handles itself, git's execution should not result in sh crashing.

EDIT:

$ git
36 git: unknown sys call geteuid [107]
36 git: unknown sys call getuid [102]
36 git: unknown sys call getegid [108]
36 git: unknown sys call getgid [104]
pagefault from user for 0xaaef80 err 7
pid 36 git: trap 14 err 7 on cpu 0 rip 0x6be64b rsp 0x7ffffffffe60 addr 0xffffffffc04046a8--kill proc
$
$ git
kernel: load_image: ELF magic number mismatch
kernel: exec failed; could not load image
sh: exec git failed
pagefault from user for 0 err 20
pid 35 sh: trap 14 err 20 on cpu 0 rip 0 rsp 0x7ffffffffe00 addr 0xffffffffc04046a8--kill proc
$ git
kernel: load_image: ELF magic number mismatch
kernel: exec failed; could not load image
sh: exec git failed
$ git
pagefault from user for 0x7ffffffffdf8 err 4
pid 40 sh: trap 14 err 4 on cpu 0 rip 0x40e3c1 rsp 0x7ffffffffdf8 addr 0xffffffffc04046a8--kill proc
$ git

[note: sv6 hung there at the end]

In other words: oh no how is it possible that the git binary suddenly stops having the right magic number.

List of strange behaviors so far:

  • gdb faults because at least one particular data page is mapped r/o instead of r/w, despite being correct in the PHDRs.
  • when I tell gdb to break on git's _start, it complains that the memory address doesn't exist, and neither it nor QEMU can display the memory of the _start symbol (and all of this does appear to match the page table in CR3, whose zeroth entry is zero.)
  • when I try to relaunch git after an initial failure, I get a magic number mismatch, with either zero or junk data like 07f4e098 showing up instead of 464c457f
  • sometimes the 'sh' process dies due to a pagefault for zero and has to be re-executed by init
  • sometimes the 'init' process dies due to a pagefault on a weird address like 0x7fffffffff58 (but the machine keeps going?)
  • sometimes the machine hangs
  • sometimes the machine kernel panics, sometimes during vmap allocation and sometimes during vmap freeing
  • when trying to continue from a breakpoint in gdb, sometimes finding myself right back where I left, and sometimes suddenly finding myself in a long sequence of zeroes at address 0x0000000000000524 or similar

EDIT 2: looks like the ELF header as stored in mfs was getting randomly zeroed; I found the following traceback of the memset call responsible:

Hardware watchpoint 3: *(uint32_t*)0xffffff0005cc6000

Old value = 1179403647
New value = 1179403520
0xffffffffc023a92d in stosb (cnt=cnt@entry=4096, data=data@entry=0, addr=0xffffff0005cc6001,
    addr@entry=0xffffff0005cc6000) at libutil/include/amd64.h:91
91        __asm volatile("cld; rep stosb" :
(gdb) bt
#0  0xffffffffc023a92d in stosb (cnt=cnt@entry=4096, data=data@entry=0, addr=0xffffff0005cc6001,
    addr@entry=0xffffff0005cc6000) at libutil/include/amd64.h:91
#1  memset (dst=0xffffff0005cc6001, dst@entry=0xffffff0005cc6000, c=c@entry=0, n=n@entry=4096)
    at lib/string.c:10
#2  0xffffffffc022a4ff in zalloc (name=name@entry=0xffffffffc02e1138 "qalloc") at kernel/kalloc.cc:1247
#3  0xffffffffc024389c in vmap::qalloc (this=0xffffffffc2583020 <__cpu_mem_key>, name=<optimized out>,
    cached_only=<optimized out>) at kernel/vm.cc:859
#4  0xffffff001c4d8308 in ?? ()
#5  0xffffff001c3ffc48 in ?? ()
#6  0xffffff001c4e0000 in ?? ()
#7  0xffffff001c491000 in ?? ()
#8  0x800000001c4dd003 in ?? ()
#9  0x0000010000000000 in ?? ()
#10 0x0000000000000000 in ?? ()

It looks to me like the page is somehow getting double-freed. Which is pretty bad.

EDIT 3: Yes, absolutely getting double-freed, but the tracebacks are incomplete and I don't quite understand the code well enough to figure out what's going on:

zalloc((vmap::pagelookup)) -> 0xffffff001c4dd000


Breakpoint 2, kfree (v=0xffffff001c4dd000, size=size@entry=4096) at kernel/kalloc.cc:1145
1145    {
(gdb) bt
#0  kfree (v=0xffffff001c4dd000, size=size@entry=4096) at kernel/kalloc.cc:1145
#1  0xffffffffc024d161 in page_info_ref::destroy (this=0xffffff001c4fb348) at include/page_info.hh:140
#2  page_info_ref::operator= (o=..., this=0xffffff001c4fb348) at include/page_info.hh:178
#3  vmdesc::operator= (this=<optimized out>) at include/vm.hh:34
#4  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::iterator::set_recursive (unset=<optimized out>, x=<optimized out>, len=<optimized out>, idx=<optimized out>,
    level=<optimized out>, node=...) at include/radix_array.hh:481
#5  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::iterator::set_at_level (x=..., level=<optimized out>, this=<optimized out>) at include/radix_array.hh:456
#6  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::fill (
    this=<optimized out>, low=..., high=..., x=..., must_be_unset=<optimized out>)
    at include/radix_array.hh:936
#7  0x0000000000000000 in ?? ()


Breakpoint 2, kfree (v=0xffffff001c4dd000, size=size@entry=4096) at kernel/kalloc.cc:1145
1145    {
(gdb) bt
#0  kfree (v=0xffffff001c4dd000, size=size@entry=4096) at kernel/kalloc.cc:1145
#1  0xffffffffc024c974 in page_info_ref::destroy (this=0xffffff001c4fb348) at include/page_info.hh:140
#2  page_info_ref::~page_info_ref (this=0xffffff001c4fb348, __in_chrg=<optimized out>)
    at include/page_info.hh:197
#3  vmdesc::~vmdesc (this=0xffffff001c4fb340, __in_chrg=<optimized out>) at include/vm.hh:34
#4  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::leaf_node::~leaf_node (this=0xffffff001c4fb000, __in_chrg=<optimized out>) at include/radix_array.hh:1406
#5  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::leaf_node::free (r=<optimized out>, this=0xffffff001c4fb000) at include/radix_array.hh:1401
#6  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::node_ptr::free (r=<optimized out>, this=<optimized out>) at include/radix_array.hh:1210
#7  radix_array<vmdesc, 34359738368ul, 4096ul, qalloc_allocator<vmdesc>, scoped_no_sched>::upper_node::free (this=<optimized out>, r=<optimized out>) at include/radix_array.hh:1314
#8  0xffffff001c4fd000 in ?? ()
#9  0x000000000000d1a0 in ?? ()
#10 0xffffff001c4481e0 in ?? ()
#11 0xffffffffc258a240 in __gc_states_key ()
#12 0xffffff001c4fe000 in ?? ()
#13 0x000000000000000a in ?? ()
#14 0x00000001c02de98c in ?? ()
#15 0x0000000000000000 in ?? ()

Notably, the destroy() invocations are on line 140, which means they are following the is_unique()-gated code path; which raises the question -- why are there two unique references to the same page?