cornell-zhang/hcl-dialect

[LLVM] Memory Corruption Running `hcl-opt --jit`

Closed this issue · 3 comments

Description

In an FFT example, hcl-opt finishes execution but reports there's memory corruption:

*** Error in `hcl-opt': corrupted size vs. prev_size: 0x000000000687b9d0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x80c37)[0x7f888e6c1c37]
/lib64/libc.so.6(+0x8120e)[0x7f888e6c220e]
hcl-opt[0x7d1048]
hcl-opt[0x7d25b8]
hcl-opt[0x75b1a6]
hcl-opt[0xac89e0]
hcl-opt[0xb49f88]
hcl-opt[0xacb75f]
hcl-opt[0x58ca9c]
hcl-opt[0x58affb]
hcl-opt[0x586682]
hcl-opt[0x5866b4]
hcl-opt[0x57bf6b]
hcl-opt[0x56f77e]
hcl-opt[0x55d7c4]
hcl-opt[0x55e046]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f888e663555]
hcl-opt[0x55d415]
======= Memory map: ========
00400000-05838000 r-xp 00000000 00:41 2330675                            /work/shared/users/phd/nz264/mlir/hcl-dialect/build/bin/hcl-opt
05a37000-05c1b000 r--p 05437000 00:41 2330675                            /work/shared/users/phd/nz264/mlir/hcl-dialect/build/bin/hcl-opt
05c1b000-05c37000 rw-p 0561b000 00:41 2330675                            /work/shared/users/phd/nz264/mlir/hcl-dialect/build/bin/hcl-opt
05c37000-05c8c000 rw-p 00000000 00:00 0 
066a8000-06984000 rw-p 00000000 00:00 0                                  [heap]
7f8880000000-7f8880021000 rw-p 00000000 00:00 0 
7f8880021000-7f8884000000 ---p 00000000 00:00 0 
7f88877ff000-7f8887800000 ---p 00000000 00:00 0 
7f8887800000-7f8888000000 rw-p 00000000 00:00 0 
7f8888000000-7f8888021000 rw-p 00000000 00:00 0 
7f8888021000-7f888c000000 ---p 00000000 00:00 0 
7f888c63d000-7f888c63e000 ---p 00000000 00:00 0 
7f888c63e000-7f888ce3e000 rw-p 00000000 00:00 0 
7f888ce3e000-7f888ce3f000 ---p 00000000 00:00 0 
7f888ce3f000-7f888d63f000 rw-p 00000000 00:00 0 
7f888d63f000-7f888d640000 ---p 00000000 00:00 0 
7f888d640000-7f888de40000 rw-p 00000000 00:00 0 
7f888de40000-7f888de41000 ---p 00000000 00:00 0 
7f888de41000-7f888e641000 rw-p 00000000 00:00 0 
7f888e641000-7f888e805000 r-xp 00000000 fd:00 203087662                  /usr/lib64/libc-2.17.so
7f888e805000-7f888ea04000 ---p 001c4000 fd:00 203087662                  /usr/lib64/libc-2.17.so
7f888ea04000-7f888ea08000 r--p 001c3000 fd:00 203087662                  /usr/lib64/libc-2.17.so
7f888ea08000-7f888ea0a000 rw-p 001c7000 fd:00 203087662                  /usr/lib64/libc-2.17.so
7f888ea0a000-7f888ea0f000 rw-p 00000000 00:00 0 
7f888ea0f000-7f888ea24000 r-xp 00000000 fd:00 235911793                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f888ea24000-7f888ec23000 ---p 00015000 fd:00 235911793                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f888ec23000-7f888ec24000 r--p 00014000 fd:00 235911793                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f888ec24000-7f888ec25000 rw-p 00015000 fd:00 235911793                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f888ec25000-7f888ed26000 r-xp 00000000 fd:00 201398980                  /usr/lib64/libm-2.17.so
7f888ed26000-7f888ef25000 ---p 00101000 fd:00 201398980                  /usr/lib64/libm-2.17.so
7f888ef25000-7f888ef26000 r--p 00100000 fd:00 201398980                  /usr/lib64/libm-2.17.so
7f888ef26000-7f888ef27000 rw-p 00101000 fd:00 201398980                  /usr/lib64/libm-2.17.so
7f888ef27000-7f888f010000 r-xp 00000000 fd:00 201334681                  /usr/lib64/libstdc++.so.6.0.19
7f888f010000-7f888f210000 ---p 000e9000 fd:00 201334681                  /usr/lib64/libstdc++.so.6.0.19
7f888f210000-7f888f218000 r--p 000e9000 fd:00 201334681                  /usr/lib64/libstdc++.so.6.0.19
7f888f218000-7f888f21a000 rw-p 000f1000 fd:00 201334681                  /usr/lib64/libstdc++.so.6.0.19
7f888f21a000-7f888f22f000 rw-p 00000000 00:00 0 
7f888f22f000-7f888f254000 r-xp 00000000 fd:00 201333404                  /usr/lib64/libtinfo.so.5.9
7f888f254000-7f888f454000 ---p 00025000 fd:00 201333404                  /usr/lib64/libtinfo.so.5.9
7f888f454000-7f888f458000 r--p 00025000 fd:00 201333404                  /usr/lib64/libtinfo.so.5.9
7f888f458000-7f888f459000 rw-p 00029000 fd:00 201333404                  /usr/lib64/libtinfo.so.5.9
7f888f459000-7f888f46e000 r-xp 00000000 fd:00 201399371                  /usr/lib64/libz.so.1.2.7
7f888f46e000-7f888f66d000 ---p 00015000 fd:00 201399371                  /usr/lib64/libz.so.1.2.7
7f888f66d000-7f888f66e000 r--p 00014000 fd:00 201399371                  /usr/lib64/libz.so.1.2.7
7f888f66e000-7f888f66f000 rw-p 00015000 fd:00 201399371                  /usr/lib64/libz.so.1.2.7
7f888f66f000-7f888f671000 r-xp 00000000 fd:00 201398978                  /usr/lib64/libdl-2.17.so
7f888f671000-7f888f871000 ---p 00002000 fd:00 201398978                  /usr/lib64/libdl-2.17.so
7f888f871000-7f888f872000 r--p 00002000 fd:00 201398978                  /usr/lib64/libdl-2.17.so
7f888f872000-7f888f873000 rw-p 00003000 fd:00 201398978                  /usr/lib64/libdl-2.17.so
7f888f873000-7f888f87a000 r-xp 00000000 fd:00 203087669                  /usr/lib64/librt-2.17.so
7f888f87a000-7f888fa79000 ---p 00007000 fd:00 203087669                  /usr/lib64/librt-2.17.so
7f888fa79000-7f888fa7a000 r--p 00006000 fd:00 203087669                  /usr/lib64/librt-2.17.so
7f888fa7a000-7f888fa7b000 rw-p 00007000 fd:00 203087669                  /usr/lib64/librt-2.17.so
7f888fa7b000-7f888fa92000 r-xp 00000000 fd:00 201399326                  /usr/lib64/libpthread-2.17.so
7f888fa92000-7f888fc91000 ---p 00017000 fd:00 201399326                  /usr/lib64/libpthread-2.17.so
7f888fc91000-7f888fc92000 r--p 00016000 fd:00 201399326                  /usr/lib64/libpthread-2.17.so
7f888fc92000-7f888fc93000 rw-p 00017000 fd:00 201399326                  /usr/lib64/libpthread-2.17.so
7f888fc93000-7f888fc97000 rw-p 00000000 00:00 0 
7f888fc97000-7f888fcb9000 r-xp 00000000 fd:00 203087655                  /usr/lib64/ld-2.17.so
7f888fe5b000-7f888fe95000 rw-p 00000000 00:00 0 
7f888feb2000-7f888feb3000 rw-p 00000000 00:00 0 
7f888feb5000-7f888feb8000 rw-p 00000000 00:00 0 
7f888feb8000-7f888feb9000 r--p 00021000 fd:00 203087655                  /usr/lib64/ld-2.17.so
7f888feb9000-7f888feba000 rw-p 00022000 fd:00 203087655                  /usr/lib64/ld-2.17.so
7f888feba000-7f888febb000 rw-p 00000000 00:00 0 
7ffcba69e000-7ffcba6c0000 rw-p 00000000 00:00 0                          [stack]
7ffcba7ab000-7ffcba7ad000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted

This issue doesn't happen at every run.

Repeat this issue

I created a self-contained test to repeat this issue, but you have to run this example a few times, and it may fail in one of them.

$ hcl-opt test.mlir --jit

The IR

module {
  memref.global "private" @A_gv : memref<8xi32> = dense<[0, 1, 2, 3, 4, 5, 6, 7]>
  memref.global "private" @omega_gv : memref<4xi32> = dense<[1, 2, 4, 8]>
  memref.global "private" @M_gv : memref<1xi32> = dense<[17]>
  func @top() { 
    // get inputs
    %arg0 = memref.get_global @A_gv : memref<8xi32>
    %arg1 = memref.get_global @omega_gv : memref<4xi32>
    %arg2 = memref.get_global @M_gv : memref<1xi32>
    
    %0 = memref.alloc() {name = "alloc:A_b", unsigned} : memref<8xi32>
    affine.for %arg3 = 0 to 8 {
      %c0_i32_0 = arith.constant {unsigned} 0 : i32
      affine.store %c0_i32_0, %0[%arg3] {to = "alloc:A_b"} : memref<8xi32>
    } {loop_name = "i0", stage_name = "alloc:A_b"}
    %1 = affine.load %arg0[0] {from = "A", unsigned} : memref<8xi32>
    affine.store %1, %0[0] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %2 = affine.load %arg0[4] {from = "A", unsigned} : memref<8xi32>
    affine.store %2, %0[1] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %3 = affine.load %arg0[2] {from = "A", unsigned} : memref<8xi32>
    affine.store %3, %0[2] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %4 = affine.load %arg0[6] {from = "A", unsigned} : memref<8xi32>
    affine.store %4, %0[3] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %5 = affine.load %arg0[1] {from = "A", unsigned} : memref<8xi32>
    affine.store %5, %0[4] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %6 = affine.load %arg0[5] {from = "A", unsigned} : memref<8xi32>
    affine.store %6, %0[5] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %7 = affine.load %arg0[3] {from = "A", unsigned} : memref<8xi32>
    affine.store %7, %0[6] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %8 = affine.load %arg0[7] {from = "A", unsigned} : memref<8xi32>
    affine.store %8, %0[7] {to = "alloc:A_b", unsigned} : memref<8xi32>
    %9 = memref.alloc() {name = "size", unsigned} : memref<1xi32>
    %c1_i32 = arith.constant {unsigned} 1 : i32
    affine.store %c1_i32, %9[0] {to = "size", unsigned} : memref<1xi32>
    %10 = memref.alloc() {name = "half", unsigned} : memref<1xi32>
    affine.store %c1_i32, %10[0] {to = "half", unsigned} : memref<1xi32>
    %11 = memref.alloc() {name = "step", unsigned} : memref<1xi32>
    %c8_i32 = arith.constant {unsigned} 8 : i32
    affine.store %c8_i32, %11[0] {to = "step", unsigned} : memref<1xi32>
    %12 = memref.alloc() {name = "k", unsigned} : memref<1xi32>
    %c0_i32 = arith.constant {unsigned} 0 : i32
    affine.store %c0_i32, %12[0] {to = "k", unsigned} : memref<1xi32>
    %13 = memref.alloc() {name = "e", unsigned} : memref<1xi32>
    affine.store %c0_i32, %13[0] {to = "e", unsigned} : memref<1xi32>
    %14 = memref.alloc() {name = "l", unsigned} : memref<1xi32>
    affine.store %c0_i32, %14[0] {to = "l", unsigned} : memref<1xi32>
    %15 = memref.alloc() {name = "r", unsigned} : memref<1xi32>
    affine.store %c0_i32, %15[0] {to = "r", unsigned} : memref<1xi32>
    %16 = memref.alloc() {name = "t1", unsigned} : memref<1xi32>
    affine.store %c0_i32, %16[0] {to = "t1", unsigned} : memref<1xi32>
    %17 = memref.alloc() {name = "t2", unsigned} : memref<1xi32>
    affine.store %c0_i32, %17[0] {to = "t2", unsigned} : memref<1xi32>
    affine.for %arg3 = 0 to 3 {
      %18 = affine.load %9[0] {from = "size", unsigned} : memref<1xi32>
      affine.store %18, %10[0] {to = "half", unsigned} : memref<1xi32>
      %19 = affine.load %9[0] {from = "size", unsigned} : memref<1xi32>
      %c1_i32_0 = arith.constant 1 : i32
      %20 = arith.extui %19 : i32 to i64
      %21 = arith.extui %c1_i32_0 : i32 to i64
      %22 = arith.shli %20, %21 : i64
      %23 = arith.trunci %22 {unsigned} : i64 to i32
      affine.store %23, %9[0] {to = "size", unsigned} : memref<1xi32>
      %24 = affine.load %11[0] {from = "step", unsigned} : memref<1xi32>
      %25 = arith.shrui %24, %c1_i32_0 {unsigned} : i32
      affine.store %25, %11[0] {to = "step", unsigned} : memref<1xi32>
      %26 = memref.alloc() {name = "i", unsigned} : memref<1xi32>
      affine.store %c0_i32, %26[0] {to = "i", unsigned} : memref<1xi32>
      scf.while : () -> () {
        %28 = affine.load %26[0] {from = "i", unsigned} : memref<1xi32>
        %29 = affine.load %9[0] {from = "size", unsigned} : memref<1xi32>
        %30 = arith.cmpi ult, %28, %29 : i32
        scf.condition(%30)
      } do {
        %c0_i32_1 = arith.constant 0 : i32
        affine.store %c0_i32_1, %12[0] {to = "k", unsigned} : memref<1xi32>
        %28 = affine.load %26[0] {from = "i", unsigned} : memref<1xi32>
        %29 = affine.load %26[0] {from = "i", unsigned} : memref<1xi32>
        %30 = affine.load %10[0] {from = "half", unsigned} : memref<1xi32>
        %31 = arith.addi %29, %30 {unsigned} : i32
        %32 = arith.index_cast %28 : i32 to index
        %33 = arith.index_cast %31 : i32 to index
        %34 = arith.index_cast %c1_i32_0 : i32 to index
        scf.for %arg4 = %32 to %33 step %34 {
          %37 = affine.load %10[0] {from = "half", unsigned} : memref<1xi32>
          %38 = arith.index_cast %37 : i32 to index
          %39 = arith.addi %arg4, %38 : index
          %40 = arith.index_cast %39 {unsigned} : index to i32
          affine.store %40, %13[0] {to = "e", unsigned} : memref<1xi32>
          %41 = memref.load %0[%arg4] {from = "alloc:A_b", unsigned} : memref<8xi32>
          affine.store %41, %14[0] {to = "l", unsigned} : memref<1xi32>
          %42 = affine.load %13[0] {from = "e", unsigned} : memref<1xi32>
          %43 = arith.index_cast %42 : i32 to index
          %44 = memref.load %0[%43] {from = "alloc:A_b", unsigned} : memref<8xi32>
          %45 = affine.load %12[0] {from = "k", unsigned} : memref<1xi32>
          %46 = arith.index_cast %45 : i32 to index
          %47 = memref.load %arg1[%46] {from = "omega", unsigned} : memref<4xi32>
          %48 = arith.muli %44, %47 {unsigned} : i32
          %49 = affine.load %arg2[0] {from = "M", unsigned} : memref<1xi32>
          %50 = arith.remsi %48, %49 {unsigned} : i32
          affine.store %50, %15[0] {to = "r", unsigned} : memref<1xi32>
          %51 = affine.load %14[0] {from = "l", unsigned} : memref<1xi32>
          %52 = affine.load %15[0] {from = "r", unsigned} : memref<1xi32>
          %53 = arith.addi %51, %52 {unsigned} : i32
          %54 = affine.load %arg2[0] {from = "M", unsigned} : memref<1xi32>
          %55 = arith.remsi %53, %54 {unsigned} : i32
          affine.store %55, %16[0] {to = "t1", unsigned} : memref<1xi32>
          %56 = affine.load %14[0] {from = "l", unsigned} : memref<1xi32>
          %57 = affine.load %arg2[0] {from = "M", unsigned} : memref<1xi32>
          %58 = arith.addi %57, %56 {unsigned} : i32
          %59 = affine.load %15[0] {from = "r", unsigned} : memref<1xi32>
          %60 = arith.subi %58, %59 {unsigned} : i32
          %61 = affine.load %arg2[0] {from = "M", unsigned} : memref<1xi32>
          %62 = arith.remsi %60, %61 {unsigned} : i32
          affine.store %62, %17[0] {to = "t2", unsigned} : memref<1xi32>
          %63 = affine.load %16[0] {from = "t1", unsigned} : memref<1xi32>
          memref.store %63, %0[%arg4] {to = "alloc:A_b", unsigned} : memref<8xi32>
          %64 = affine.load %17[0] {from = "t2", unsigned} : memref<1xi32>
          %65 = affine.load %13[0] {from = "e", unsigned} : memref<1xi32>
          %66 = arith.index_cast %65 : i32 to index
          memref.store %64, %0[%66] {to = "alloc:A_b", unsigned} : memref<8xi32>
          %67 = affine.load %12[0] {from = "k", unsigned} : memref<1xi32>
          %68 = affine.load %11[0] {from = "step", unsigned} : memref<1xi32>
          %69 = arith.addi %67, %68 {unsigned} : i32
          affine.store %69, %12[0] {to = "k", unsigned} : memref<1xi32>
        } {loop_name = "loop_1"}
        %35 = affine.load %26[0] {from = "i", unsigned} : memref<1xi32>
        %36 = arith.addi %35, %c1_i32_0 : i32
        affine.store %36, %26[0] {to = "i", unsigned} : memref<1xi32>
        scf.yield
      }
      %27 = hcl.create_loop_handle "loop_1" : !hcl.LoopHandle
    } {loop_name = "loop_0"}

    hcl.print(%0) {format = "%.0f \n"} : memref<8xi32>

    return
  }
}

@chhzh123 Do you have experience with such memory corruption issue? I remember there are relevant issues with memory customizations but I couldn't find them.

In this case the printed result is non-deterministic. At first I thought there are some un-initialized memory. But I checked all allocated memrefs and they are all initialized.

Not sure about the exact problem, but most of these cases are because of out-of-bound access

This issue is indeed caused by out-of-bound memory access. This example implements the Cooley-Tucky algorithm for FFT. However, the index is not correctly generated, so this line has an out-of-bound memory access:

 %41 = memref.load %0[%arg4] {from = "alloc:A_b", unsigned} : memref<8xi32>

The JIT comes with MLIR doesn't have any proper error handling for out-of-bound memory access. If the heap is by chance allocated, it reads whatever is in there (hence the random result), otherwise if throws a corrupted memory error.

I fixed the bug in the Cooley-Tucky implementation, and now the error goes away and the results are correct.

It's worth considering how we could detect out-of-bound memory access and throw proper error for that.