memory fault in emmalloc prev_region

In Fedora, we are building wasi-libc with MALLOC_IMPL=emmalloc, because dlmalloc's CC0 is problematic (#319). However, in further testing I have run into a null pointer crash in this emmalloc port. My reproducer is:

Build current wasi-libc with MALLOC_IMPL=emmalloc
Point to that build in the rust toolchain config [target.wasm32-wasi] wasi-root = "path/to/sysroot", and ./x build library --target wasm32-wasi.
Start a new project, cargo new --lib foo; cd foo.
Build the test, cargo +stage1 test --no-run --target wasm32-wasi.
Run wasmtime -- ./target/wasm32-wasi/debug/deps/foo-*.wasm --help.

Error: failed to run main module `./target/wasm32-wasi/debug/deps/foo-80180d1f382acf95.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x455cd - prev_region
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:309:27
                      - attempt_allocate
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:681:26
[...]
    2: memory fault at wasm address 0xfffffffc in linear memory of size 0x120000
    3: wasm trap: out of bounds memory access

I get similar crash running with wasmer too. I only see this when wasi-libc is built with -O2 -DNDEBUG (and I added -g for backtraces), but it doesn't crash or hit any assertions without -DNDEBUG. I also tried with (rebased) #378 and it was no better.

In that line of code, it seems clear that 0xfffffffc must be a null pointer wrapping around by [-1].

wasi-libc/emmalloc/emmalloc.c

Line 309 in aecd368

size_t prevRegionSize = ((size_t*)region)[-1];

clang-analyzer also finds a null pointer error on that line: report-3472e3.html.gz. That error path includes some of the initialization in claim_more_memory that's different than the original emscripten code, but it's not the same path as what I actually hit at runtime.

Full wasmtime backtrace:

Error: failed to run main module `./target/wasm32-wasi/debug/deps/foo-80180d1f382acf95.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x455cd - prev_region
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:309:27
                      - attempt_allocate
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:681:26
           1: 0x45156 - allocate_memory
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:800:19
                      - emmalloc_memalign
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:893:15
           2: 0x4573c - emmalloc_malloc
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:915:10
                      - malloc
                           at ~/src/wasi-libc/emmalloc/emmalloc.c:920:10
           3: 0x435fc - std::sys::wasi::alloc::<impl core::alloc::global::GlobalAlloc for std::alloc::System>::alloc::h30bf2b5c8e81bf65
                           at ~/rust/library/std/src/sys/wasi/../unix/alloc.rs:14:13
                      - __rdl_alloc
                           at ~/rust/library/std/src/alloc.rs:381:13
           4: 0x24bb - <unknown>!__rust_alloc
           5: 0x38755 - alloc::alloc::alloc::h7d5c328179f6fa67
                           at ~/rust/library/alloc/src/alloc.rs:100:9
                      - alloc::alloc::Global::alloc_impl::h1bd1691f491977af
                           at ~/rust/library/alloc/src/alloc.rs:183:73
                      - <alloc::alloc::Global as core::alloc::Allocator>::allocate::h12ef56f8ce7e59a1
                           at ~/rust/library/alloc/src/alloc.rs:243:9
                      - alloc::alloc::exchange_malloc::h609177d2ac860183
                           at ~/rust/library/alloc/src/alloc.rs:332:18
                      - alloc::boxed::Box<T>::new::hb82ce197ac45e458
                           at ~/rust/library/alloc/src/boxed.rs:217:9
                      - getopts::Options::usage_items::hc5b79851b290aebc
                           at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getopts-0.2.21/src/lib.rs:592:9
           6: 0x38333 - getopts::Options::usage_with_format::hdda5b0b199d95f00
                           at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getopts-0.2.21/src/lib.rs:513:24
                      - getopts::Options::usage::h1649511b98ad5ea4
                           at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/getopts-0.2.21/src/lib.rs:498:9
           7: 0x10d5c - test::cli::usage::h13bd525bb03cc06d
                           at ~/rust/library/test/src/cli.rs:192:17
                      - test::cli::parse_opts::h4ef2547817c55788
                           at ~/rust/library/test/src/cli.rs:213:9
           8: 0x33a31 - test::test_main::h9b57329eed8a804d
                           at ~/rust/library/test/src/lib.rs:99:26
           9: 0x3475a - test::test_main_static::h8c233200492b5e32
                           at ~/rust/library/test/src/lib.rs:158:5
          10: 0x21c2 - foo::main::h27f72609c412c5ae
                           at /tmp/tmp.6bCstl/tmp/foo/src/lib.rs:1:1
          11: 0x1bf2 - core::ops::function::FnOnce::call_once::hfe53c2cf666d96fa
                           at ~/rust/library/core/src/ops/function.rs:250:5
          12: 0x1d84 - std::sys_common::backtrace::__rust_begin_short_backtrace::h994502d568a74c3c
                           at ~/rust/library/std/src/sys_common/backtrace.rs:135:18
          13:  0xaf4 - std::rt::lang_start::{{closure}}::hcd76d0dca9e3dad5
                           at ~/rust/library/std/src/rt.rs:166:18
          14: 0x3cbc2 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hffbdde799ca12b5f
                           at ~/rust/library/core/src/ops/function.rs:284:13
                      - std::panicking::try::do_call::h4b5d5542f1c86349
                           at ~/rust/library/std/src/panicking.rs:500:40
                      - std::panicking::try::h51bfb7e0e56693e6
                           at ~/rust/library/std/src/panicking.rs:464:19
                      - std::panic::catch_unwind::h11d480fe44ac6ca4
                           at ~/rust/library/std/src/panic.rs:142:14
                      - std::rt::lang_start_internal::{{closure}}::h6e0ffd5421ca9da0
                           at ~/rust/library/std/src/rt.rs:148:48
                      - std::panicking::try::do_call::hfa71db6696b40a2e
                           at ~/rust/library/std/src/panicking.rs:500:40
                      - std::panicking::try::h63d1edb0b6dbf2fd
                           at ~/rust/library/std/src/panicking.rs:464:19
                      - std::panic::catch_unwind::h9e006dc595210a8f
                           at ~/rust/library/std/src/panic.rs:142:14
                      - std::rt::lang_start_internal::h2c53e8a6f6be7467
                           at ~/rust/library/std/src/rt.rs:148:20
          15:  0xa91 - std::rt::lang_start::h8dc3ccc2d1a087a6
                           at ~/rust/library/std/src/rt.rs:165:17
          16: 0x21e6 - <unknown>!__main_void
          17:  0xa0c - _start
                           at ~/src/wasi-libc/libc-bottom-half/crt/crt1-command.c:43:13
    2: memory fault at wasm address 0xfffffffc in linear memory of size 0x120000
    3: wasm trap: out of bounds memory access

clang-analyzer [...] not the same path as what I actually hit at runtime.

Wild guess: maybe UB-related optimization nerfed the code in the clang-analyzer reported path, leading to bad state in the backtrace I hit at runtime.

A couple of questions:

looks like this is a "wasi-libc in Rust" scenario--the issue does not happen with dlmalloc in the same scenario? (I'm guessing it doesn't which is why we haven't seen this yet?)
also, does this happen in "wasi-libc in C" scenarios?

I have not seen any issues like this with dlmalloc.

I haven't tried with a C program -- I imagine it would have the same issue, because Rust is just calling malloc/realloc/free, but I can try to figure out the allocator pattern to reproduce it in C.

I can reproduce it with this C program:

#include <stdlib.h>
int main() {
    char *p = malloc(1);
    for (unsigned i = 1; i < 20; ++i) {
        p = realloc(p, 1 << i);
    }
}

It crashes at i = 16, although I removed my prints to minimize it here. :)

Error: failed to run main module `crash.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0:  0x726 - prev_region
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:309:27
                     - attempt_allocate
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:681:26
           1:  0x3a2 - allocate_memory
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:854:19
                     - emmalloc_memalign
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:893:15
           2:  0xceb - emmalloc_aligned_realloc
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:1142:18
                     - emmalloc_realloc
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:1243:10
                     - realloc
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:1248:10
           3:  0x16c - <unknown>!__original_main
           4:   0xb5 - _start
                           at /home/jistone/src/wasi-libc/libc-bottom-half/crt/crt1-command.c:43:13
    2: memory fault at wasm address 0xfffffffc in linear memory of size 0x20000
    3: wasm trap: out of bounds memory access

This one is a little different:

#include <stdlib.h>
int main() {
    char *p = malloc(1);
    p = realloc(p, 12);
    p = malloc(1);
}

Error: failed to run main module `crash-c.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0:  0x73b - create_free_region
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:376:43
                     - attempt_allocate
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:703:5
           1:  0x252 - allocate_memory
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:800:19
                     - emmalloc_memalign
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:893:15
           2:  0x838 - emmalloc_malloc
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:915:10
                     - malloc
                           at /home/jistone/src/wasi-libc/emmalloc/emmalloc.c:920:10
           3:  0x12f - main
                           at /home/jistone/bugs/wasi-libc-421/crash.c:5:9
           4:   0xb5 - _start
                           at /home/jistone/src/wasi-libc/libc-bottom-half/crt/crt1-command.c:43:13
    2: memory fault at wasm address 0x2109c in linear memory of size 0x20000
    3: wasm trap: out of bounds memory access

Ouch, yes, both test programs also crash when compiled with zig cc, which also links emmalloc.

As a point of comparison both those programs run fine when build with emscripten + run under node. Perhaps something how about sbrk() differs? Or local diffs with emmalloc itself?

One thing I found is that this calls clz(0), which is undefined:

wasi-libc/emmalloc/emmalloc.c

Line 843 in 7018e24

    
           int largestBucketIndex = NUM_FREE_BUCKETS - 1 - __builtin_clzll(freeRegionBucketsUsed);

In emscripten, the constructor adds at least one region, but that's commented out here in wasi:

wasi-libc/emmalloc/emmalloc.c

Lines 642 to 643 in 7018e24

    
           // Start with a tiny dynamic region. 
        
           claim_more_memory(3*sizeof(Region));

I tried adding if (freeRegionBucketsUsed == 0) claim_more_memory(3*sizeof(Region)); near the top of that function that had clz(0), but it didn't change the crashes overall.

Just adding a couple printf() makes the crash go away. So, likely a UB or memory corruption.

By the way, you should declare p as char * volatile p, or else it may work just because the compiler optimizes out all the code since allocations are not actually used.

I'm not optimizing the main code, but ok, I went ahead and added volatile in my copies.

I have found that -fno-strict-aliasing (while compiling wasi-libc, including emmalloc) also makes the crash go away. I'm playing with a few asm-barriers to see if I can nail down where aliasing may be misbehaving.

In the Region struct, defining size as a uint32_t instead of size_t also makes everything work 🤷

Oh, one other difference is that alignof(max_align_t) differs between wasm32-emscripten and wasm32-wasi. (8 in former 16 in the latter).

This barrier makes it work:

@@ -1046,10 +1046,11 @@ static int attempt_region_resize(Region *region, size_t size)
     assert(HAS_ALIGNMENT(newNextRegionStartPtr, sizeof(size_t)));
     // Next region does not shrink to too small size?
     if (newNextRegionStartPtr + sizeof(Region) <= nextRegionEndPtr)
     {
       unlink_from_free_list(nextRegion);
+      __asm__ __volatile__("":::"memory"); // barrier
       create_free_region(newNextRegionStartPtr, nextRegionEndPtr - newNextRegionStartPtr);
       link_to_free_list((Region*)newNextRegionStartPtr);
       create_used_region(region, newNextRegionStartPtr - (uint8_t*)region);
       return 1;
     }

There's quite a bit of type punning throughout -- maybe we should just add -fno-strict-aliasing to emmalloc's CFLAGS?

__asm__ __volatile__("":::"memory"); // barrier

Shouldn't a barrier be added every time the size of a different region is accessed/modified?

diff --git a/emmalloc/emmalloc.c b/emmalloc/emmalloc.c
index c98e42e..41dd308 100644
--- a/emmalloc/emmalloc.c
+++ b/emmalloc/emmalloc.c
@@ -302,10 +302,13 @@ static int compute_free_list_bucket(size_t allocSize)
   return bucketIndex;
 }
 
+#define MEMORY_BARRIER __asm__ __volatile__("":::"memory")
+
 #define DECODE_CEILING_SIZE(size) ((size_t)((size) & ~FREE_REGION_FLAG))
 
 static Region *prev_region(Region *region)
 {
+  MEMORY_BARRIER;
   size_t prevRegionSize = ((size_t*)region)[-1];
   prevRegionSize = DECODE_CEILING_SIZE(prevRegionSize);
   return (Region*)((uint8_t*)region - prevRegionSize);
@@ -318,6 +321,7 @@ static Region *next_region(Region *region)
 
 static size_t region_ceiling_size(Region *region)
 {
+  MEMORY_BARRIER;
   return ((size_t*)((uint8_t*)region + region->size))[-1];
 }
 
@@ -363,6 +367,7 @@ static void create_used_region(void *ptr, size_t size)
   assert(size >= sizeof(Region));
   *(size_t*)ptr = size;
   ((size_t*)ptr)[(size/sizeof(size_t))-1] = size;
+  MEMORY_BARRIER;
 }
 
 static void create_free_region(void *ptr, size_t size)
@@ -374,6 +379,7 @@ static void create_free_region(void *ptr, size_t size)
   Region *freeRegion = (Region*)ptr;
   freeRegion->size = size;
   ((size_t*)ptr)[(size/sizeof(size_t))-1] = size | FREE_REGION_FLAG;
+  MEMORY_BARRIER;
 }
 
 static void prepend_to_free_list(Region *region, Region *prependTo)
@@ -712,6 +718,7 @@ static void *attempt_allocate(Region *freeRegion, size_t alignment, size_t size)
     // region as used memory, not leaving a free memory region behind.
     // Initialize the free region as used by resetting the ceiling size to the same value as the size at bottom.
     ((size_t*)((uint8_t*)freeRegion + freeRegion->size))[-1] = freeRegion->size;
+    MEMORY_BARRIER;
   }
 
 #ifdef __EMSCRIPTEN_TRACING__
@@ -986,6 +993,7 @@ void emmalloc_free(void *ptr)
 #endif
 
   // Check merging with left side
+  MEMORY_BARRIER;
   size_t prevRegionSizeField = ((size_t*)region)[-1];
   size_t prevRegionSize = prevRegionSizeField & ~FREE_REGION_FLAG;
   if (prevRegionSizeField != prevRegionSize) // Previous region is free?
@@ -1038,6 +1046,7 @@ static int attempt_region_resize(Region *region, size_t size)
   // is a free region.
   Region *nextRegion = next_region(region);
   uint8_t *nextRegionEndPtr = (uint8_t*)nextRegion + nextRegion->size;
+  MEMORY_BARRIER;
   size_t sizeAtCeiling = ((size_t*)nextRegionEndPtr)[-1];
   if (nextRegion->size != sizeAtCeiling) // Next region is free?
   {
@@ -1392,6 +1401,7 @@ static int trim_dynamic_heap_reservation(size_t pad)
     return 0; // emmalloc is not controlling any dynamic memory at all - cannot release memory.
   uint8_t *previousSbrkEndAddress = listOfAllRegions->endPtr;
   assert(sbrk(0) == previousSbrkEndAddress);
+  MEMORY_BARRIER;
   size_t lastMemoryRegionSize = ((size_t*)previousSbrkEndAddress)[-1];
   assert(lastMemoryRegionSize == 16); // // The last memory region should be a sentinel node of exactly 16 bytes in size.
   Region *endSentinelRegion = (Region*)(previousSbrkEndAddress - sizeof(Region));

Shouldn't a barrier be added every time the size of a different region is accessed/modified?

I don't think the barrier is a real solution, more of a hack that I was using to figure out where it was going wrong. It's still UB to violate strict aliasing, but the barrier only works to impede the main optimization opportunity that would take advantage of aliasing. I think -fno-strict-aliasing is a cleaner solution because it removes the UB, without any chance that we missed somewhere that's aliasing.

	// Start with a tiny dynamic region.
	claim_more_memory(3*sizeof(Region));