cornell-zhang/heterocl

Problem testing GEMM Sample in Vivado HLS

tono12 opened this issue · 5 comments

Hi,

I wanted to ask some help for replicating the reported results for the GEMM samples.
I assume the reported results of the GEMM algorithms comes from the sample/systolic_array/ codes. I've tried to use them to generate Vivado HLS code, but I haven't been successful.

When using the systolic_array_vitis.py or the systolic_array_stream.py I've been getting error messages from the VHLS synthesis showing what seems to be some kind of I/O access problems to the memories in messages from Vivado HLS such as:

WARNING: [SCHED 204-68] The II Violation in module 'test': Unable to enforce a carried dependence constraint (II = 1, distance = 1, offset = 1) between 'store' operation (kernel.cpp:56) of variable 'tmp_s', kernel.cpp:56 on array 'output.V', kernel.cpp:17 and 'load' operation ('output_V_load_1', kernel.cpp:56) on array 'output.V', kernel.cpp:17.

WARNING: [SCHED 204-69] Unable to schedule 'load' operation ('output_V_load_8', kernel.cpp:56) on array 'output.V', kernel.cpp:17 due to limited memory ports. Please consider using a memory core with more ports or partitioning the array 'output_V'.

Synthesis then fails. I've tried to make it even smaller (4x4x4) or using larger Parts to see if it was a size issue, but I always get the same messages.

I also tried using the code from systolic_array_main.py. I can synthesize and implement the generated VHLS code, but it seems to be implementing one multiplication-accumulation per clock cycle, as the resource utilization is extremely low (3 DSPs and 300 LUTs) while requiring a lot of time to be completed.

I've also tried the GEMM code from samples/GEMM and I've also got this same behaviour: Only 3 DSPs used and very slow. I've tried to add some of the available optimizations like parallel and pipeline to those codes, but it doesn't make any change in the results.

I just don't have any idea anymore about what to test in order to replicate the reported GEMM results. Any pointers or ideas would be very appreciated.

pd" I've run and tested many of the other apps getting the expected results, this is the first time I see something like this.

Hi, for our GEMM results, we rely on the PolySA framework (now called AutoSA) to generate high-performance systolic arrays. Previously PolySA was not open-source so the GEMM example you see is more like a toy example. Now since AutoSA is open-source, we have a separate branch called heteroflow that integrates HeteroCL with AutoSA. We are still testing it. Once it is functional, we'll create a PR and I'll tag you.

@tono12 Since AutoSA has very rigid environment requirements, I will recommend installing HeteroFlow+AutoSa in docker container. We will release a docker image for the integrated toolflow within this week. Stay tuned.

@tono12 Please use this docker file to build the image: https://github.com/cornell-zhang/heterocl/blob/heteroflow/docker/Dockerfile.autosa. Inside the docker container, you can generate a high performance GEMM systolic array with AutoSA and HCL.

First you want to make sure AutoSA is working as expected. Try AutoSA example here (inside the docker container): https://github.com/UCLA-VAST/AutoSA/tree/master/autosa_tests/mm

The HCL-AutoSA integration example is available here (HCL relies on AutoSA to generate systolic array, so the performance number is no different): https://github.com/cornell-zhang/heterocl/blob/heteroflow/samples/systolic_array/systolic_array_autosa.py

Hi, thanks for the help.

I've got two follow up questions:
1- HeteroCL version from that Dockerfile seems to lack the platforms modules. I keep getting the error:

module 'heterocl' has no attribute 'platform'

any pointers on how to solve this? Code without using platforms works without problems. Also AutoSA seems to be properly installed.

2- Is it possible to use the AutoSA backend with other targets than the Vitis/AWS.f1 combination? As I said at the begining, my goal is to generate Viivado HLS code from it.