cornell-zhang/heterocl

TVM Argument Binding Failed for 512-bit UInt Datatype

Opened this issue · 8 comments

Problem Description

FlexCNN uses 512-bit global input output bus but implementing it in HeteroCL causes TVM error in src/pass/arg_binder.cc.

hcl.UInt(512) seems to generate uint0 buffer which causes TVM arg binding problem.

Error Message:

$ python samples/flexcnn/flexcnn.py
Using TensorFlow backend.
[16:41:10] Mark stage update_global_cin on FPGA scope...
[16:41:10] Mark stage cin_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update0 on FPGA scope...
[16:41:10] Mark stage cin_load_prev on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update4 on FPGA scope...
[16:41:10] Mark stage weight_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update5 on FPGA scope...
[16:41:10] Mark stage weight_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update21 on FPGA scope...
[16:41:10] Mark stage cout_write on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage layer_config on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] src/schedule/schedule_reorder.cc:558: top_kernel should be set as an endpoint... rolling back
Traceback (most recent call last):

  File "samples/flexcnn/flexcnn.py", line 401, in <module>
    test_flexcnn()

  File "samples/flexcnn/flexcnn.py", line 394, in test_flexcnn
    code = str(hcl.build(s, p, name="main"))

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/api.py", line 335, in build
    return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt, schedule_name=schedule.name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 572, in build
    return build_fpga_kernel(sch, args, target, name=name, schedule_name=schedule_name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 428, in build_fpga_kernel
    flist = lower(sch, args, kernel_only=True, name=name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 350, in lower
    stmt = ir_pass.StorageFlatten(stmt, binds, 64)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))

heterocl.tvm._ffi.base.TVMError: [16:41:10] src/pass/arg_binder.cc:84: Check failed: arg->dtype == value->dtype (uint0 vs. uint32) Argument global_cin Buffer bind data type mismatch

Stack trace returned 10 entries:
[bt] (0) 0   libhcl.dylib                        0x00000001136facae dmlc::StackTrace() + 254
[bt] (1) 1   libhcl.dylib                        0x00000001136faa5f dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2   libhcl.dylib                        0x00000001138860af TVM::ir::ArgBinder::BindBuffer(TVM::Buffer const&, TVM::Buffer const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) + 575
[bt] (3) 3   libhcl.dylib                        0x00000001139aa6e7 TVM::ir::StorageFlattener::HandleBufferBindScope(Halide::Internal::AttrStmt const*) + 4423
[bt] (4) 4   libhcl.dylib                        0x000000011399dad5 TVM::ir::StorageFlattener::Mutate_(Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&) + 1941
[bt] (5) 5   libhcl.dylib                        0x00000001138f8e05 std::__1::__function::__func<TVM::ir::$_1, std::__1::allocator<TVM::ir::$_1>, Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(Halide::Internal::AttrStmt const*&&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*&&) + 21
[bt] (6) 6   libhcl.dylib                        0x00000001138f83d1 std::__1::__function::__func<TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>& TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::set_dispatch<Halide::Internal::AttrStmt>(std::__1::function<Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>)::'lambda'(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*), std::__1::allocator<TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>& TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::set_dispatch<Halide::Internal::AttrStmt>(std::__1::function<Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>)::'lambda'(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>, Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*&&) + 49
[bt] (7) 7   libhcl.dylib                        0x000000011374daec TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*) const + 348
[bt] (8) 8   libhcl.dylib                        0x00000001138675fb TVM::ir::IRMutator::Mutate(Halide::Internal::Stmt) + 59
[bt] (9) 9   libhcl.dylib                        0x00000001139aa812 TVM::ir::StorageFlattener::HandleBufferBindScope(Halide::Internal::AttrStmt const*) + 4722

Repeat the Error

HeteroCL version: Hecmay/heterocl:fix

Code: zzzDavid:heterocl/samples/flexcnn/flexcnn.py
(Needs samples/flexcnn/kernel/*.cpp)

$ python samples/flexcnn/flexcnn.py

Possible Cause

Argument binding supports only up to 128-bit

We only support up to 255 bits in HeteroCL right now.

@seanlatias what prevents us from allowing wider integer? At the very least, we need to prompt error message instead of letting the tool crash.

I think I mentioned this before, we use 8-bit to store the total bitwidth and thus we can only support up to 255 bits.

If you think this one has a higher priority then I'll fix this first.

Yes, we should fix this issue since we already have a relatively simple solution in mind.
I thought this one is different since Niansong is talking about the global input/output.

No, according to his code, he is just generating the code without running the cpu simulation. In this case, we are not limited to numpy and thus we should be able to generate code for larger bitwidth for the global input/output.

Let's be sure to prompt error message. We also need to create test case to check the message.

@zzzDavid should be fixed by #303. Let me know otherwise.