cornell-zhang/heterocl

Cannot generate fixed point data type for Intel OpenCL

Opened this issue · 3 comments

Loading the data.
Files already downloaded and verified
Traceback (most recent call last):
  File "resnet_aocl.py", line 8, in <module>
    resnet20 = build_resnet20_inf(params, target="aocl")
  File "/scratch/users/dp638/Work/hcl_samples/bnn/resnet_main.py", line 172, in build_resnet20_inf
    return hcl.build(s, target=target)
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/api.py", line 335, in build
    return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt, schedule_name=schedule.name)
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/tvm/build_module.py", line 577, in build
    return build_fpga_kernel(sch, args, target.target_name, name=name, schedule_name=schedule_name)
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/tvm/build_module.py", line 436, in build_fpga_kernel
    ret = builder(fdevice)
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/tvm/_ffi/_ctypes/function.py", line 181, in __call__
    check_call(_LIB.TVMFuncCall(
  File "/home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [20:31:12] src/codegen/opencl/codegen_aocl.cc:200: Cannot convert typefixed32_12to AOCL type
Stack trace returned 10 entries:
[bt] (0) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(dmlc::StackTrace()+0x40) [0x7fe9f61d74b0]
[bt] (1) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x2b) [0x7fe9f61d7c5b]
[bt] (2) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(TVM::codegen::CodeGenAOCL::PrintType(Halide::Type, std::ostream&)+0xb4) [0x7fe9f6499024]
[bt] (3) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(TVM::codegen::CodeGenAOCL::AddFunction(TVM::LoweredFunc, std::unordered_map<std::string, std::tuple<std::string, Halide::Type>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::tuple<std::string, Halide::Type> > > >)+0xd22) [0x7fe9f649bec2]
[bt] (4) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(std::string TVM::codegen::BuildOpenCL<TVM::codegen::CodeGenAOCL>(TVM::Array<TVM::LoweredFunc, void>, TVM::codegen::OutputMode)+0x286) [0x7fe9f64a6e06]
[bt] (5) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(+0xb7e91a) [0x7fe9f64a591a]
[bt] (6) /home/dp638/.local/lib/python3.8/site-packages/heterocl-0.3-py3.8.egg/lib/libhcl.so(TVMFuncCall+0x52) [0x7fe9f662c652]
[bt] (7) /home/dp638/tools/miniconda3/lib/python3.8/lib-dynload/../../libffi.so.7(+0x69dd) [0x7fea4bbd59dd]
[bt] (8) /home/dp638/tools/miniconda3/lib/python3.8/lib-dynload/../../libffi.so.7(+0x6067) [0x7fea4bbd5067]
[bt] (9) /home/dp638/tools/miniconda3/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x10da8) [0x7fea4bbebda8]

The issue seems to be still there. The fix branch is not installing the required tvm. Rather it is still installing just the egg file of tvm = 1.0.0 in ~/.local/lib/python3.8/site-packages.

@hecmay
So the recipe is the following:

  1. Comment out the .frontend on the hlib/init.py. It will stop from importing tvm.
  2. Then get the nn.py from the other student's repo and replace the default hlib/op/nn.py

Then it will run and generate the kernel.cl . But if someone needs to use tvm for whatever purpose, the installation script needs to be fixed or the frontend line needs to be commented out. Else it will be a blockage.

However, the generated Kernel code does not compile. I have attached the code and the error report to this message.

BNN_Error.zip

@hecmay I used the automatically generated kernel.cl for BNN. It did not work. I have attached a tarball. The tarball has two directories.

  1. device_kernel_1: Partially fixed kernel code
  2. errors: Contains build_1.sh.e<PBS_ID> error log files with increasing PBS ID (higher PBS ID means later error logs).

The AOCL code in the device folder contains the AOCL code with my latest fix to remove many such errors but not completely error free.

In summary, it seems to me that clang fails to parse the multi-dimensional float array assignment along with some other float pointer cast.

err_package.tar.gz