cornell-zhang/heterocl

Weird array indices generation

Opened this issue · 4 comments

I'm not sure why the access pattern in the generated VHLS code of the following example becomes so complex. Seems it only happens when the output array is larger the input array.

def test_complex():
    dtype = hcl.Float()
    A = hcl.placeholder((1,1,8,8),"A",dtype)
    def kernel(A):
        return hcl.compute((1,1,10,10), lambda i, c, x, y: hcl.select(tvm.all(x < 8, y < 8),A[i, c, x, y],0), "B", dtype)
    s = hcl.create_schedule([A], kernel)
    target = hcl.platform.zc706
    target.config(compile="vivado_hls",mode="csim")
    f = hcl.build(s, target=target)
    hcl_A = hcl.asarray(np.zeros((1,1,8,8)))
    hcl_B = hcl.asarray(np.zeros((1,1,10,10)))
    f(hcl_A,hcl_B)
void test(float B[1][1][10][10], float A[1][1][8][8]) {
    B_x: for (bit32 x = 0; x < 10; ++x) {
      B_y: for (bit32 y = 0; y < 10; ++y) {
        B[0][0][x][y] = ((hls::max(x, y) < 8) ? ((float)A[(((y / 8) + x) / 8)][0][(((y / 8) + x) % 8)][(y % 8)]) : (0.000000e+00f));
      }
    }
  }

I think accessing A[0][0][x][y] is okay. Is the simplification logic somewhere not working properly?

I found the problem. The simplification logic only works when the loop range is smaller than the array size.

Expr max = Simplify(Substitute(op->a, range) + 1);
Expr comp = Simplify(max <= op->b);
if (is_one(comp))

I think it's unnecessary? And the correctness of the program should be guaranteed by the users (e.g. avoid accessing invalid memory).
Even this comparison is added, the program can still access memory out of the boundary. For example, change the compute function to hcl.compute((1,1,10,10), lambda i, c, x, y: A[i, c, x, y], "B", dtype), then (y / 8 + x)/8 can be 1 when x=y=9, which is larger than the size of the 1st dimension.
@seanlatias

This is the legacy from Halide. I think maybe we can add a tag or something to turn on/off this feature.

I found the problem. The simplification logic only works when the loop range is smaller than the array size.

Expr max = Simplify(Substitute(op->a, range) + 1);
Expr comp = Simplify(max <= op->b);
if (is_one(comp))

I think it's unnecessary?

These several lines should not be directly removed. Otherwise, the correctness of the program may not be ensured. As an example,

uint32 A[4][4];
uint32 B[16];
for (uint32 i = 0; i < 16; ++i)
    B[i] = A[i / 4][i % 4];

will be simplified to

for (uint32 i = 0; i < 16; ++i)
    B[i] = A[0][i];

which may cause SegFault and is incorrect.

@seanlatias we should first support multi-dimensional array in our internal IR.