/tapa

TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.

Primary LanguageCMIT LicenseMIT

TAPA

CI install Documentation Status

TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.

TAPA Framework

High-Frequency

  • TAPA explicitly decouples communication and computation for better QoR.

  • TAPA integrates the AutoBridge floorplanner to optimize the RTL generation process.

  • TAPA achieves higher the frequency on average compared to Vivado. 1

Speed

  • TAPA compiles faster than Vitis HLS. 2

  • TAPA provides faster software simulation than Vitis HLS.2

  • TAPA provides faster RTL simulation than Vitis.

  • [in-progress] TAPA is integrating RapidStream that is up to 10× faster than Vivado.3

Expressiveness

  • TAPA extends the Vitis HLS syntax for richer expressiveness at the C++ level.

  • TAPA provides dedicated APIs for arbitrary external memory access patterns.

  • TAPA allows users to explicitly specify parallelism.

  • In addition to static burst analysis, TAPA supports runtime burst detectuion by transparently merging small memory transactions into large bursts.

HBM-Specific Optimizations

  • TAPA significantly reduce the area overhead of HBM interface IPs compared to Vitis HLS.

  • TAPA includes an automated design space exploration tool to balance the resource pressure and the wire pressure for HBM FPGAs.

  • TAPA automatically select the physical channel for each top-level argument of your accelerator.

Successful Cases

  • Serpens, DAC'22, achieves 270 MHz on the Xilinx Alveo U280 HBM board when using 24 HBM channels. The Vitis HLS baseline failed in routing.
  • Sextans, FPGA'22, achieves 260 MHz on the Xilinx Alveo U250 board when using 4 DDR channels. The Vivado baseline achieves only 189 MHz.
  • SPLAG, FPGA'22, achieves up to a 4.9× speedup over state-of-the-art FPGA accelerators, up to a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and up to a 0.9× speedup over an A100 GPU (that has 4.1× power budget and 3.4× HBM bandwidth).
  • AutoSA Systolic-Array Compiler, FPGA'21: AutoSA Frequency Figure
  • KNN, FPT'20, achieves 252 MHz on the Xilinx Alveo U280 board. The Vivado baseline achieves only 165 MHz.

Getting Started

TAPA Publications