cornell-zhang/heterocl

HCL Smith-Waterman Example: HLS code failed synthesis

hecmay opened this issue · 7 comments

Im trying to reproduce some performance numbers mentioned in HCL paper on AWS F1.

===>The following messages were generated while  performing high-level synthesis for kernel: default_function Log file: /heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log :
ERROR: [v++ 200-1471] Stop unrolling loop 'VITIS_LOOP_30_7' (/heterocl/samples/smith_waterman/aws/kernel.cpp:11) in function 'default_function' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.\

ERROR: [v++ 200-70] Pre-synthesis failed.
ERROR: [v++ 60-300] Failed to build kernel(ip) default_function, see log for details: /heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log
ERROR: [v++ 60-773] In '/heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log', caught Tcl error: ERROR: [HLS 200-1471] Stop unrolling loop 'VITIS_LOOP_30_7' (/heterocl/samples/smith_waterman/aws/kernel.cpp:11) in function 'default_function' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.\
ERROR: [v++ 60-773] In '/heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log', caught Tcl error: ERROR: [HLS 200-70] Pre-synthesis failed.
ERROR: [v++ 60-599] Kernel compilation failed to complete
ERROR: [v++ 60-592] Failed to finish compilation
INFO: [v++ 60-1653] Closing dispatch client.
make: *** [_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel.xo] Error 1

Here is part of generated HLS code from HCL. The root cause, as indicated in the error log, is that the second loop's body is too large to be unrolled. We probably need function outlining for the large loop body here to make it synthesizable

void default_function(ap_uint<3> seqAs[1024][128], ap_uint<3> seqBs[1024][128], ap_uint<3> outAs[1024][256], ap_uint<3> outBs[1024][256]) {
  ap_int<32> B;
  for (ap_int<32> t_outer = 0; t_outer < 32; ++t_outer) {
  #pragma HLS pipeline
    for (ap_int<32> t_inner = 0; t_inner < 32; ++t_inner) {
    #pragma HLS unroll
      ap_int<32> maxtrix_max;
      maxtrix_max = 0;
      ap_int<32> i_max;
      i_max = 0;
      ap_int<32> j_max;
      j_max = 0;
      ap_int<16> matrix[129][129];
      for (ap_int<32> x = 0; x < 129; ++x) {
        for (ap_int<32> y = 0; y < 129; ++y) {
          matrix[x][y] = (ap_int<16>)0;
        }
      }
      // ... omit other code inside the loop body
      // there are many other loop nests inside the second loop's body
  }
}

We shouldn't unroll the loop. We should just pipeline it. Do we specify that in the HCL code?

We should just move the initialization outside.

It's ok to modify the HCL code as long as it makes sense and is still functional.

We shouldn't unroll the loop. We should just pipeline it. Do we specify that in the HCL code?

In HCL code, the outer loop is pipelined, and inner loop is optimized with parallel(). I am using the pre-generated HLS code inside smith_waterman folder.

So these loops should be moved outside of the top-level loop nests, right?

      ap_int<16> matrix[129][129];
      for (ap_int<32> x = 0; x < 129; ++x) {
        for (ap_int<32> y = 0; y < 129; ++y) {
          matrix[x][y] = (ap_int<16>)0;
        }
      }
      ap_int<16> action[129][129];
      for (ap_int<32> x1 = 0; x1 < 129; ++x1) {
        for (ap_int<32> y1 = 0; y1 < 129; ++y1) {
          action[x1][y1] = (ap_int<16>)3;
        }
      }

I will modify the HLS code first and will update the HCL code once I ensure the HSL can actually work.

The log from Vitis HLS 2020 is a bit misleading and it points to a random line number and complains that "the loop" in that line cannot be unrolled...

After I switched to Vitis 2019, I figured out that Vitis HLS actually has difficulty unrolling one of the loop nest inside body of the inner loop, which is annotated in the snippet below.

  for (ap_int<32> t_outer = 0; t_outer < 32; ++t_outer) {
    for (ap_int<32> t_inner = 0; t_inner < 32; ++t_inner) {
      #pragma HLS pipeline
      ap_int<32> mutate3;
      
      for (ap_int<32> i = 0; i < 129; ++i) { // THIS CANNOT BE UNROLLED
        for (ap_int<32> j = 0; j < 129; ++j) {
          ap_int<32> trace_back[4];
          for (ap_int<32> x2 = 0; x2 < 4; ++x2) {
            trace_back[x2] = 0;
          }

Do you have the complete HLS generated code?