StanfordAHA/garnet

coreir core dump during mem tile generation

Closed this issue · 11 comments

FYI coreir keeps core dumping on me. It's not a new thing. But it doesn't seem to prevent correct garnet build either, so it hasn't yet been top priority. Odd though.

You can see the error in situ here, if you've signed tsmc NDA.
https://buildkite.com/tapeout-aha/mflowgen/builds/314#1a04838d-afc6-4a41-81e1-a7b9a1839a47
For the rest of you, I will paste below. As you can see, it happens during memory-core rtl gen...

Genesis Is Starting Work On Your Design ---
  Genesis2::Manager->gen_verilog: Starting code generation from module memory_core
 
/pycoreir/coreir-cpp/src/binary/coreir.cpp:188 Running Runningvpasses
/pycoreir/coreir-cpp/src/passes/transform/rungenerators.cpp:10 In Run Generators
/pycoreir/coreir-cpp/src/passes/transform/rungenerators.cpp:26 Done running generators
/pycoreir/coreir-cpp/src/binary/coreir.cpp:197 Running vpasses
/pycoreir/coreir-cpp/src/binary/coreir.cpp:238 Modified?: Yes
*** Error in `/usr/local/lib/python3.7/site-packages/coreir/coreir': double free or corruption (fasttop): 0x0000000002dc2730 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81679)[0x7f40a0a33679]
/usr/local/lib/python3.7/site-packages/coreir/coreir(_ZN9__gnu_cxx13new_allocatorIcE10deallocateEPcm+0x20)[0x75d512]
/usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSs4_Rep10_M_destroyERKSaIcE+0x4a)[0x754db6]
/usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSs4_Rep10_M_disposeERKSaIcE+0x5a)[0x74de64]
/usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSsD1Ev+0x3e)[0x7464b4]
/lib64/libc.so.6(+0x39c99)[0x7f40a09ebc99]
/lib64/libc.so.6(+0x39ce7)[0x7f40a09ebce7]
/lib64/libc.so.6(__libc_start_main+0xfc)[0x7f40a09d450c]
/usr/local/lib/python3.7/site-packages/coreir/coreir[0x717c8d]
======= Memory map: ========
003fe000-00400000 rw-p 00000000 fd:00 34063556                           /usr/local/lib/python3.7/site-packages/coreir/coreir
00400000-00b18000 r-xp 00002000 fd:00 34063556                           /usr/local/lib/python3.7/site-packages/coreir/coreir
00d17000-00d1c000 r--p 00719000 fd:00 34063556                           /usr/local/lib/python3.7/site-packages/coreir/coreir
00d1c000-00d1d000 rw-p 0071e000 fd:00 34063556                           /usr/local/lib/python3.7/site-packages/coreir/coreir
00d1d000-00d1f000 rw-p 00000000 00:00 0
02c7f000-064af000 rw-p 00000000 00:00 0                                  [heap]
7f4098000000-7f4098021000 rw-p 00000000 00:00 0
7f4098021000-7f409c000000 ---p 00000000 00:00 0
7f409dd6c000-7f409dd82000 r-xp 00000000 00:2f 8777321                    /cad/cadence/INNOVUS19.10.000.lnx86/tools.lnx86/lib/64bit/libgcc_s.so.1
7f409dd82000-7f409df81000 ---p 00016000 00:2f 8777321                    /cad/cadence/INNOVUS19.10.000.lnx86/tools.lnx86/lib/64bit/libgcc_s.so.1
7f409df81000-7f409df82000 r--p 00015000 00:2f 8777321                    /cad/cadence/INNOVUS19.10.000.lnx86/tools.lnx86/lib/64bit/libgcc_s.so.1
7f409df82000-7f409df83000 rw-p 00016000 00:2f 8777321                    /cad/cadence/INNOVUS19.10.000.lnx86/tools.lnx86/lib/64bit/libgcc_s.so.1
7f409df83000-7f409e72b000 r-xp 00000000 fd:00 34056283                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-commonlib.so
7f409e72b000-7f409e92b000 ---p 007a8000 fd:00 34056283                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-commonlib.so
7f409e92b000-7f409e930000 r--p 007a8000 fd:00 34056283                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-commonlib.so
7f409e930000-7f409e955000 rw-p 007ad000 fd:00 34056283                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-commonlib.so
7f409e955000-7f409e958000 rw-p 00000000 00:00 0
7f409e958000-7f409edda000 rw-p 00b25000 fd:00 34056283                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-commonlib.so
7f409edda000-7f409f522000 r-xp 00000000 fd:00 34056286                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float.so
7f409f522000-7f409f721000 ---p 00748000 fd:00 34056286                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float.so
7f409f721000-7f409f726000 r--p 00747000 fd:00 34056286                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float.so
7f409f726000-7f409f74b000 rw-p 0074c000 fd:00 34056286                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float.so
7f409f74b000-7f409f74d000 rw-p 00000000 00:00 0
7f409f74d000-7f409fbb6000 rw-p 00a97000 fd:00 34056286                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float.so
7f409fbb6000-7f40a0313000 r-xp 00000000 fd:00 34063553                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float_DW.so
7f40a0313000-7f40a0512000 ---p 0075d000 fd:00 34063553                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float_DW.so
7f40a0512000-7f40a0517000 r--p 0075c000 fd:00 34063553                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float_DW.so
7f40a0517000-7f40a053c000 rw-p 00761000 fd:00 34063553                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float_DW.so
7f40a053c000-7f40a053e000 rw-p 00000000 00:00 0
7f40a053e000-7f40a09b2000 rw-p 00ab2000 fd:00 34063553                   /usr/local/lib/python3.7/site-packages/coreir/libcoreir-float_DW.so
7f40a09b2000-7f40a0b75000 r-xp 00000000 fd:00 7110                       /usr/lib64/libc-2.17.so
7f40a0b75000-7f40a0d75000 ---p 001c3000 fd:00 7110                       /usr/lib64/libc-2.17.so
7f40a0d75000-7f40a0d79000 r--p 001c3000 fd:00 7110                       /usr/lib64/libc-2.17.so
7f40a0d79000-7f40a0d7b000 rw-p 001c7000 fd:00 7110                       /usr/lib64/libc-2.17.so
7f40a0d7b000-7f40a0d80000 rw-p 00000000 00:00 0
7f40a0d80000-7f40a0e81000 r-xp 00000000 fd:00 7118                       /usr/lib64/libm-2.17.so
7f40a0e81000-7f40a1080000 ---p 00101000 fd:00 7118                       /usr/lib64/libm-2.17.so
7f40a1080000-7f40a1081000 r--p 00100000 fd:00 7118                       /usr/lib64/libm-2.17.so
7f40a1081000-7f40a1082000 rw-p 00101000 fd:00 7118                       /usr/lib64/libm-2.17.so
7f40a1082000-7f40a1394000 r-xp 00000000 fd:00 17254583                   /usr/local/lib/python3.7/site-packages/coreir/.libs/libverilogAST-d0340ace.so
7f40a1394000-7f40a1594000 ---p 00312000 fd:00 17254583                   /usr/local/lib/python3.7/site-packages/coreir/.libs/libverilogAST-d0340ace.so
7f40a1594000-7f40a159e000 r--p 00312000 fd:00 17254583                   /usr/local/lib/python3.7/site-packages/coreir/.libs/libverilogAST-d0340ace.so
7f40a159e000-7f40a15ab000 rw-p 0031c000 fd:00 17254583                   /usr/local/lib/python3.7/site-packages/coreir/.libs/libverilogAST-d0340ace.so  WARNING: 183 shift/reduce conflicts

Hmm, I'll take a look into this to see if I can find anything

No rush. As mentioned earlier, it's not a high priority bug (yet).

Tracking it down, I figured out that it's localized to the coreir invocation itself (so not an issue with the Python bindings). Here's a way to reproduce just using coreir

  • SimpleALU.json
{"top":"global.SimpleALU",
"namespaces":{
  "global":{
    "modules":{
      "ConfigReg":{
        "type":["Record",[
          ["D",["Array",2,"BitIn"]],
          ["Q",["Array",2,"Bit"]],
          ["CLK",["Named","coreir.clkIn"]],
          ["CE","BitIn"]
        ]],
        "instances":{
          "conf_reg":{
            "modref":"global.Register_has_ce_True_has_reset_False_has_async_reset_False_has_async_resetn_False_type_Bits_n_2"
          }
        },
        "connections":[
          ["self.CE","conf_reg.CE"],
          ["self.CLK","conf_reg.CLK"],
          ["self.D","conf_reg.I"],
          ["self.Q","conf_reg.O"]
        ]
      },
      "Mux2xOutBits2":{
        "type":["Record",[
          ["I0",["Array",2,"BitIn"]],
          ["I1",["Array",2,"BitIn"]],
          ["S","BitIn"],
          ["O",["Array",2,"Bit"]]
        ]],
        "instances":{
          "coreir_commonlib_mux2x2_inst0":{
            "genref":"commonlib.muxn",
            "genargs":{"N":["Int",2], "width":["Int",2]}
          }
        },
        "connections":[
          ["self.I0","coreir_commonlib_mux2x2_inst0.in.data.0"],
          ["self.I1","coreir_commonlib_mux2x2_inst0.in.data.1"],
          ["self.S","coreir_commonlib_mux2x2_inst0.in.sel.0"],
          ["self.O","coreir_commonlib_mux2x2_inst0.out"]
        ]
      },
      "Mux4xOutUInt16":{
        "type":["Record",[
          ["I0",["Array",16,"BitIn"]],
          ["I1",["Array",16,"BitIn"]],
          ["I2",["Array",16,"BitIn"]],
          ["I3",["Array",16,"BitIn"]],
          ["S",["Array",2,"BitIn"]],
          ["O",["Array",16,"Bit"]]
        ]],
        "instances":{
          "coreir_commonlib_mux4x16_inst0":{
            "genref":"commonlib.muxn",
            "genargs":{"N":["Int",4], "width":["Int",16]}
          }
        },
        "connections":[
          ["self.I0","coreir_commonlib_mux4x16_inst0.in.data.0"],
          ["self.I1","coreir_commonlib_mux4x16_inst0.in.data.1"],
          ["self.I2","coreir_commonlib_mux4x16_inst0.in.data.2"],
          ["self.I3","coreir_commonlib_mux4x16_inst0.in.data.3"],
          ["self.S","coreir_commonlib_mux4x16_inst0.in.sel"],
          ["self.O","coreir_commonlib_mux4x16_inst0.out"]
        ]
      },
      "Register_has_ce_True_has_reset_False_has_async_reset_False_has_async_resetn_False_type_Bits_n_2":{
        "type":["Record",[
          ["I",["Array",2,"BitIn"]],
          ["O",["Array",2,"Bit"]],
          ["CLK",["Named","coreir.clkIn"]],
          ["CE","BitIn"]
        ]],
        "instances":{
          "enable_mux":{
            "modref":"global.Mux2xOutBits2"
          },
          "value":{
            "genref":"coreir.reg",
            "genargs":{"width":["Int",2]},
            "modargs":{"clk_posedge":["Bool",true], "init":[["BitVector",2],"2'h0"]}
          }
        },
        "connections":[
          ["value.out","enable_mux.I0"],
          ["self.I","enable_mux.I1"],
          ["value.in","enable_mux.O"],
          ["self.CE","enable_mux.S"],
          ["value.clk","self.CLK"],
          ["value.out","self.O"]
        ]
      },
      "SimpleALU":{
        "type":["Record",[
          ["a",["Array",16,"BitIn"]],
          ["b",["Array",16,"BitIn"]],
          ["c",["Array",16,"Bit"]],
          ["config_data",["Array",2,"BitIn"]],
          ["config_en","BitIn"],
          ["CLK",["Named","coreir.clkIn"]]
        ]],
        "instances":{
          "Mux4xOutUInt16_inst0":{
            "modref":"global.Mux4xOutUInt16"
          },
          "config_reg":{
            "modref":"global.ConfigReg"
          },
          "magma_Bits_16_add_inst0":{
            "genref":"coreir.add",
            "genargs":{"width":["Int",16]}
          },
          "magma_Bits_16_mul_inst0":{
            "genref":"coreir.mul",
            "genargs":{"width":["Int",16]}
          },
          "magma_Bits_16_sub_inst0":{
            "genref":"coreir.sub",
            "genargs":{"width":["Int",16]}
          },
          "magma_Bits_16_xor_inst0":{
            "genref":"coreir.xor",
            "genargs":{"width":["Int",16]}
          }
        },
        "connections":[
          ["magma_Bits_16_add_inst0.out","Mux4xOutUInt16_inst0.I0"],
          ["magma_Bits_16_sub_inst0.out","Mux4xOutUInt16_inst0.I1"],
          ["magma_Bits_16_mul_inst0.out","Mux4xOutUInt16_inst0.I2"],
          ["magma_Bits_16_xor_inst0.out","Mux4xOutUInt16_inst0.I3"],
          ["self.c","Mux4xOutUInt16_inst0.O"],
          ["config_reg.Q","Mux4xOutUInt16_inst0.S"],
          ["self.config_en","config_reg.CE"],
          ["self.CLK","config_reg.CLK"],
          ["self.config_data","config_reg.D"],
          ["self.a","magma_Bits_16_add_inst0.in0"],
          ["self.b","magma_Bits_16_add_inst0.in1"],
          ["self.a","magma_Bits_16_mul_inst0.in0"],
          ["self.b","magma_Bits_16_mul_inst0.in1"],
          ["self.a","magma_Bits_16_sub_inst0.in0"],
          ["self.b","magma_Bits_16_sub_inst0.in1"],
          ["self.a","magma_Bits_16_xor_inst0.in0"],
          ["self.b","magma_Bits_16_xor_inst0.in1"]
        ]
      }
    }
  }
}
}

run the above json with

coreir -i build/SimpleALU.json -o build/SimpleALU.v -l commonlib

It seems I'm able to produce the issue with just running the flatten types pass (no verilog), so perhaps it's related to that logic.

coreir -i build/SimpleALU.json -p flattentypes -l commonlib

Hmm, also getting it with rungenerators, maybe it has to do with the pass manager?

I was able to fix the issue locally for me with rdaly525/coreir#841

We can push a more thorough fix that updates all the passes, but maybe this specific change will fix the issue for you.

@rdaly525, if you want to merge that, I can do a new pycoreir release that @steveri can test

CC @hofstee not sure if you can use a coreir branch, but this might help unblock you too (let us know if it still happens, we may need to update the other passes)

Wow, thanks Lenny for getting through this so quickly. I'll watch for the new release to come out...

@steveri new release is available, can you try it out and see if you're still getting a coredump?

Looks good! Thanks very much

% python3 garnet.py -v >& tmp.before
% python3 -m pip install --upgrade coreir
% python3 garnet.py -v >& tmp.after
% diff tmp.{before,after}
> *** Error in `/usr/local/lib/python3.7/site-packages/coreir/coreir': double free or corruption (fasttop): 0x0000000001ff2000 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x81679)[0x7f5da6bb9679]
> /usr/local/lib/python3.7/site-packages/coreir/coreir(_ZN9__gnu_cxx13new_allocatorIcE10deallocateEPcm+0x20)[0x75d512]
> /usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSs4_Rep10_M_destroyERKSaIcE+0x4a)[0x754db6]
> /usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSs4_Rep10_M_disposeERKSaIcE+0x5a)[0x74de64]
> /usr/local/lib/python3.7/site-packages/coreir/coreir(_ZNSsD1Ev+0x3e)[0x7464b4]

I will go ahead and close this issue, you guys can take care of coreir#841...thanks again!