cornell-zhang/heterocl

Imperative stages cannot be streamed

Closed this issue · 5 comments

It's able to stream single imperative stage. However, code mixes with imperative and declarative stages cannot be streamed as usual, as shown below.

def test_imperative():
    dtype = hcl.Float()
    A = hcl.placeholder((4, 4), "A", dtype)

    def kernel(A):

        def func(data):
            out = hcl.compute((4, 4),lambda x, y: 0, "out", dtype)
            with hcl.Stage("S"):
                with hcl.for_(0, 4, name="i") as i:
                    with hcl.for_(0, 4, name="j") as j:
                        out[i, j] = data[i, j] + 1
            return out

        B = func(A)
        C = hcl.compute((4,4), lambda i, j: B[i, j] + 1, "C")
        return C

    s = hcl.create_schedule([A], kernel)

    target = hcl.platform.zc706
    target.config(compile="vivado_hls",mode="csyn")
    s.to(A, target.xcel)
    s.to(kernel.C, target.host)
    f = hcl.build(s, target=target)
    np_A = np.random.randint(0, 10, A.shape)
    hcl_A = hcl.asarray(np_A,dtype)
    hcl_B = hcl.asarray(np.zeros((4, 4),np.float),dtype)
    f(hcl_A, hcl_B)
heterocl.tvm._ffi.base.TVMError: [00:37:15] src/schedule/schedule_reorder.cc:345: Check failed: input.size() > 0 Cannot found boundary for output [Tensor(shape=[4, 4], op.name=C)]. The compilation flow requires the device scope to form an enclosed subgraph. Make sure the input tensors are moved to FPGA correctly...

Seems some nodes in the stage graph are not annotated correctly.

What happens if you also specify s.to(out, target.xcel)?

@hecmay, do we handle the case where the tensor is declared inside (i.e., not via the arguments, like out in this case)?

What happens if you also specify s.to(out, target.xcel)?

The error message is the same, but it marks the stage twice.

[01:49:01] Mark stage S on FPGA scope...
[01:49:01] Mark stage S on FPGA scope...
[01:49:01] Mark stage C on FPGA scope...

@seanlatias Yes, we can. In that case, we should be able to attach the tensor automatically to the imperative stage, I suppose. I added a pass to fix this issue -- we still use the dataflow analysis and restored DFG to partition the graph. If there is any problem with the graph partitioning (like in this case, the restored DFG does not correctly capture the stage hierarchy), we just offload the whole graph to FPGA.