cornell-zhang/heterocl

Unable to partition tensors inside a function

Opened this issue · 4 comments

A simple example is shown below (modified from the tutorial),

def test():
    hcl.init()
    A = hcl.placeholder((10, 10), "A")
    def kernel(A):
        B = hcl.compute(A.shape, lambda x, y: A[x][y] + 1, "B")
        C = hcl.compute(A.shape, lambda x, y: B[x][y] + 1, "C") # add this line
        return C
    s = hcl.create_schedule(A, kernel)
    s.partition(kernel.B)
    print(hcl.lower(s))

which causes Runtime Error.

Traceback (most recent call last):
  File "partition.py", line 16, in <module>
    test()
  File "partition.py", line 13, in test
    print(hcl.lower(s))
  File "/home/chz/heterocl/python/heterocl/api.py", line 276, in lower
    return _lower(schedule.sch, new_inputs, simple_mode=True)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 349, in lower
    stmt = ir_pass.LiftAllocateAttrs(stmt)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [12:29:02] src/ir/IR.cpp:445: Check failed: first.defined() Block of undefined

It's able to run when partitioning the A array or the C array, thus only inner tensors cause the problem. I have also tried s.partition(kernel.B._op), but it cannot work either.

This should be fixed with #206, which we will merge soon. Thanks for filing the issue.

The error is caused by the IR pass lift_allocate_attrs creating an invalid block statement. But the root cause occurs before that.

The bug has been fixed in #206. I am able to run your example. But it is not fully as expected... I can still see an allocate statement for the partitioned buffer:

// attr [_top] storage_scope = "global"
allocate _top[int32 * 1]
produce _top {
  // attr [0] extern_scope = 0
  // attr [B] storage_scope = "global"
  allocate B[int32 * 10 * 10]
  array partition variable=B complete factor=0 dim=0
  produce B {
    // attr [0] extern_scope = 0
    // attr [B.partitioned] storage_scope = "global"
    allocate B.partitioned[int32 * 1]
    for (x, 0, 10) {
      for (y, 0, 10) {
        B[(y + (x*10))] = (A[(y + (x*10))] + 1)
      }
    }
  }
  produce C {
    // attr [0] extern_scope = 0
    for (x, 0, 10) {
      for (y, 0, 10) {
        C[(y + (x*10))] = (B[(y + (x*10))] + 1)
      }
    }
  }
}

Here is the test case I added before: https://github.com/Hecmay/heterocl/blob/stream_to/tests/test_schedule_stream.py#L531

Yes, it's weird... I see many xxx_partitioned variables before each loop.

I need to look into this pass: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/pass/lift_allocate_attrs.cc

It seems that some assumptions are not satisfied...