/LevelST

[FPGA 2024] Source code and bitstream for LevelST: Stream-based Accelerator for Sparse Triangular Solver

Primary LanguageTclMIT LicenseMIT

LevelST: Stream-based Accelerator for Sparse Triangular Solver with HBM-FPGA

DOI

LevelST is an HBM-FPGA-based stream accelerator for sparse triangular solvers. It is designed and tested on the Xilinx Alveo U280 FPGA board.

Dependencies:

  • TAPA & Autobridge (follow this instruction to install TAPA and other dependencies)
  • Vitis 2021.2, Vivado 2021.2
  • Xilinx xilinx_u280_xdma_201920_3 platform shell (more recent platform requires modification on Autobridge source file)

Input matrix format:

The host code takes Matrix Market format. We test on triangular matrices decomposed from sparse matrices in the SuiteSparse collection.

Dataset

All test matrices are located here. There are three types of matrix files:

  • *_trig: These are generated by decomposing the matrices in the SuiteSparse collection.
  • *_alt: These are generated by matrix reordering & decomposition of the matrices in SuiteSparse collection to boost performance.
  • *: These are the original matrices in the SuiteSparse collection. For testing, we only use the lower triangular portion.

In the dataset, some matrix file has a corresponding JSON file (same name but with .json extension). Please pass the value of row to the host executable when testing on these matrices to crop them (detail in software simulation). We will later modify the host code to automatically detect JSON files for cropping.

Build Host and Software Simulation

Compile the host code

make

This will run g++ to compile the host code for you.

Notice: The Makefile is written to execute on a server with a package manager like spack to link the included files and library binary. You are free to change -I flags and -L flags depending on your system setup. Also, remember to set the environment variables LD_LIBRARY_PATH and CPATH in .bashrc

Finally, execute the software simulation

./trig-solver

The default matrix is lp1.mtx provided in this repository. To test other matrices, simply pass an argument by

./trig-solver --file <matrix_file.mtx>

This will run LevelST over the whole matrix. To perform cropping, simply pass an integer as the number of rows you want to restrict. For example, to enforce the number of rows at 200000, run:

./trig-solver --file <matrix_file.mtx> 200000

All arguments in software simulation are also available for cosim and hardware execution.

Run TAPA & Autobridge for HLS and Floorplanning Optimization

bash run_tapa.sh

This will generate a folder containing multiple subfolders, where each contains:

  • a TCL file for floorplanning constraint
  • A bash script to run bitstream generation
  • Autobridge log
  • HLS code of each module compiled by TAPA
  • RTL code
  • HLS logs and reports

A rough estimation of area usage is in the Autobridge log. Each subfolder represents a solution generated by Autobridge

Hardware Emulation (cosim)

Modify the bash script solver.xilinx_u280_xdma_201920_3.hw.xo.tapa/run-n/solver.xilinx_u280_xdma_201920_3.hw_generate_bitstream.sh by uncomment the second TARGET variable and DEBUG variable.

#!/bin/bash
# TARGET=hw
TARGET=hw_emu
DEBUG=-g

Run the bitstream generation for hardware emulation

bash solver.xilinx_u280_xdma_201920_3.hw_generate_bitstream.sh

You will get an xclbin file under vitis_run_hw_emu folder. Run the emulation by

./trig-solver --bitstream path/to/the/xclbin/file

Run on FPGA hardware

Use the same bash script without uncommenting. Run the bitstream generation for FPGA fabric. There will be an xclbin file under vitis_run_hw folder. Run the hardware by

./trig-solver --bitstream path/to/the/xclbin/file

We have already generated the bitstream for you under the bitstream folder. So you can simply run

./trig-solver --bitstream bitstream/TrigSolver_xilinx_u280_xdma_201920_3_fwd.xclbin

Other useful reports include

  • TrigSolver_xilinx_u280_xdma_201920_3_fwd.xclbin.info: information about clock speed and HBM/DDR usage
  • solver_final.tcl: the floorplanning constraint we used

Power consumption and on-chip resource utilization are in the vitis_run_hw folder.