Debugging in the Brave New World of Reconfigurable Hardware (Artifact)

This artifact includes 20 hardware bugs, each of them can be reproduced with Verilator in a push-button manner. It also includes the five tools we designed to help bug localization (i.e., SignalCat, FSM Monitor, Statistics Monitor, Dependency Monitor, and LossCheck), as well as examples of using these tools and the instructions of reproducing the figures in the paper.

The full list of 68 bugs we studied can be found here.

If you have an interesting bug that you can reproduce, feel free to submit a pull request and we will add it to this repo. If you notice a bug that's not reproducible but still want to share to others, you may request edit access to the spreadsheet and add it there.

0. Downloading the Repository
1. Reproducible Bugs
- 1.1 Installation
- 1.2 Reproducing Bugs with Verilator
2. Debugging Tools
3. Licenses and Terms

0. Downloading the Repository

Use the following command to download the artifact repository:

git clone --recursive https://github.com/efeslab/asplos22-hardware-debugging-artifact

After this command, you are expected to see the following directory hierarchy:

asplos22-hardware-debugging-artifact
├── hardware-bugbase
│   ├── c1-dead-lock-sdspi
│   ├── c2-producer-consumer-mismatch-optimus
│   ├── c3-signal-asynchrony-sdspi
│   ├── c4-signal-asynchrony-axi-stream-fifo
│   ├── common
│   ├── d10-failure-to-update-sha512
│   ├── d11-failure-to-update-frame-fifo
│   ├── d12-failure-to-update-frame-fifo
│   ├── d13-failure-to-update-frame-len
│   ├── d1-buffer-overflow-rsd
│   ├── d2-buffer-overflow-grayscale
│   ├── d3-buffer-overflow-optimus
│   ├── d4-buffer-overflow-frame-buffer
│   ├── d5-bit-truncation-sha512
│   ├── d6-bit-truncation-fft
│   ├── d7-misindexing-fadd
│   ├── d8-misindexing-axis-switch
│   ├── d9-endianness-mismatch-sdspi
│   ├── manual_debug_log
│   ├── n1-frame-len-failure-to-update
│   ├── n3-frame-fifo-fail-to-update
│   ├── n8-axis-adapter-incomplete-implementation
│   ├── n9-frame-fifo-failure-to-update
│   ├── s1-protocol-violation-axi-lite
│   ├── s2-protocol-violation-axi-stream
│   ├── s3-incomplete-implementation-axis-adapter
│   └── scripts
└── veripass
    ├── dbgtools
    ├── model
    ├── passes
    ├── Pyverilog
    ├── recording
    ├── utils
    └── verilator

1. Reproducible Bugs

The hardware-bugbase directory contains all the reproducible bugs. Each bug is located in a directory, together with a simplified code snippet that helps understanding.

1.1 Installation

You will need to install compile Verilator to reproduce these bugs. Verilator is located under the veripass directory.

Before compilation, you will need to install a few dependencies:

sudo apt-get install perl python3 make autoconf g++ flex bison ccache
sudo apt-get install libgoogle-perftools-dev numactl perl-doc
sudo apt-get install libfl2 libfl-dev  # Ubuntu only (ignore if gives error)
sudo apt-get install zlibc zlib1g zlib1g-dev  # Ubuntu only (ignore if gives error)

Then compile Verilator:

cd asplos22-hardware-debugging-artifact/veripass/verilator
autoconf
./configure
make -j8

Compilation is enough. You do not need to install it. Scripts in the bug database will find the location of Verilator themselves.

1.2 Reproducing Bugs with Verilator

Bugs are listed in the table below. You may cd into the directory of each bug to reproduce it and read its documentations.

To reproduce a specific bug:

cd asplos22-hardware-debugging-artifact/hardware-bugbase/<bug-dir>
make -j8 # compile the verilog code for simulation
make sim  # run the simulation
make wave # open the generated waveform with GTKWave

You are expected to see an error message after make sim. make wave requires you using GUI, or the DISPLAY environment variable being set correctly. After simulation, you can also find a .fst file or a .vcd file under the directory. These are the waveforms generated by Verilator. You can copy the file to another computer and open it with GTKWave or other waveform-viewing software.

Bug ID	Bug Name
D1	Buffer Overflow - RSD
D2	Buffer Overflow - Grayscale
D3	Buffer Overflow - Optimus
D4	Buffer Overflow - Frame FIFO
D5	Bit Truncation - SHA512
D6	Bit Truncation - FFT
D7	Misindexing - FADD
D8	Misindexing - AXI-Stream Switch
D9	Endianness Mismatch - SDSPI
D10	Failure-to-Update - SHA512
D11	Failure-to-Update - Frame FIFO
D12	Failure-to-Update - Frame FIFO
D13	Failure-to-Update - Frame Length Measurer
C1	Dead Lock - SDSPI
C2	Producer-Consumer Mismatch - Optimus
C3	Signal Asynchrony - SDSPI
C4	Signal Asynchrony - AXI-Stream FIFO
S1	Protocol Violation - AXI-Lite
S1	Protocol Violation - AXI-Stream
S3	Incomplete Implementation - AXI-Stream Adapter

2. Debugging Tools

Our debugging tools locate in the veripass directory. In the hardware-bugbase directory, we provide make scripts to invoke these debugging tools.

Warning: A full evaluation of this part takes days, because FPGA synthesis is slow (e.g., up to several hours per-run). We encourage you to evaluate the non-synthesis part (e.g., 2.3.1) first.

2.1 Installation

To run the debugging tools, you will need to compile Verilator and Pyverilog if you have not done so already:

cd asplos22-hardware-debugging-artifact/veripass
make -j8

And install the following python packages:

pip3 install jinja2 sympy ply gephistreamer

And add the following lines to your .bashrc or .zshrc to help the scripts find Vivado, Quartus, and VCS. Vivado must be the Design Suite edition, Quartus must be the Pro edition with version 17.0, and VCS must be the MX edition.

# Quartus Pro
export QUARTUS_HOME=<your-quartus-home>/17.0/quartus
export PATH=$QUARTUS_HOME/bin:$PATH
export LM_LICENSE_FILE=<your-quartus-license>

# Vivado
export XILINX_VIVADO=<your-vivado-home>/Vivado/2020.2
export PATH=$XILINX_VIVADO/bin:$PATH
export XILINXD_LICENSE_FILE=<your-vivado-license>

# VCS MX
export VCS_HOME=<your-vcs-home>
export PATH=$VCS_HOME/bin:$PATH
export SNPSLMD_LICENSE_FILE=<your-vcs-license>

In order to synthesize projects for Intel HARP, you will need to download a supported version of Intel FPGA Basic Building Blocks, a set of platform files for HARP, and have the following additional lines in .bashrc or .zshrc. You can ask your Intel contact for BBS_6.4.0. You may want to read this to understand the interface of the HARP platform. It is theoretically possible to compile these HARP projects for the PAC platform (which is more widely available); however, we did not evaluate it.

export OPAE_PLATFORM_ROOT=<your-opae-platform-root-location>/BBS_6.4.0
export PATH=$OPAE_PLATFORM_ROOT/bin:$PATH

The original framework for HARP simulation requires Python 2 as the default python command. As a result, you may need to set up a virtualenv with the following command:

virtualenv --python=/usr/bin/python2 <path-to-virtualenv>

2.2 SignalCat and the Monitors

2.2.1 Debugging Logs with SignalCat and the Monitors

In Section 6.2 of the paper, we demonstrated that a developer can use SignalCat and the Monitors to localize all the 20 bugs in this artifact. We provide the mental debugging logs of a developer localizing these bugs in this sheet. For each bug, the sheet includes the tools the developer would use at each step. The configurations for invoking these tools are located in a .cfg file under each bug's directory; you can invoke the tools using the following commands under each bug's directory:

make withtask.v

After running this command, a file called withtask.v will be generated. This file contains the flattened verilog code with the debugging instrumentations described in the configuration.

2.2.2 Reproducing the Resource Overhead

To synthesize the instrumented circuit, you may run the following command:

source <path-to-virtualenv>/bin/activate # switch to a python virtualenv where python2 is the default
make sweep_depth

This command will generate a number of files (e.g., instrumented circuit with different buffer size, the TCL scripts to invoke synthesis, etc) and invoke the synthesis script for the circuit and run syntheses with different recording buffer size. This command would froze for a long time, because each synthesis takes hours.

After the command finishes, you can run the following command to report resource utilization.

make report_depth_sweep

For D4, D6, D7, D8, D9, D11, D12, D13, C1, C3, C4, S1, S2, and S3, you will see something like the following.

log2(Depth),10,11,12,13
Total LUTs,2225,2208,2191,2287
FFs,2870,2881,2892,2905
RAMB36,4,7,15,30
RAMB18,0,1,0,0

build_notask: Total LUTs,FFs,RAMB36,RAMB18
        858;516;0;0

The upper block shows the resource utilization of instrumented circuit, and the bottom block shows the resource utilization of the uninstrumented circuit. In the paper, we use the word Logic for LUT, Register for FF, and calculate the total number of bits from RAM36 (36Kbit per instance) and RAM18 (18Kbit per instance). In the above example, the register overhead of an instrumented circuit with a 1024-depth buffer is 2870-517=2354.

For D1, D2, D3, D5, D10, and C2, you will see something like the following. We use the Logic for ALM, Register for FF, and use the number of BRAM Blocks to calculate BRAM size (each block contains 20Kbits).

log2(Depth),10,11,12,13
ALM,101170,101173,101185,101191
BRAM#B,326,343,376,477
BRAMbit,3989920,4332960,5019040,6391200
FFs,111356,111371,111397,111447

build_notask: ALM BRAM#B BRAMbit FFs
100245;309;3646880;108734

2.3 LossCheck

Bug D1, D2, D3, D4, C2, and C4 are the six data loss bugs that can be localized by LossCheck.

2.3.1 Data Loss Localization for the 6 Data Loss Bugs

You can use the following command to invoke LossCheck under the directories of these four bugs.

make -f Makefile.lc

For D1, D2, D3, and C4: This will generate two .v files (e.g., a <benchmark>.losscheck.0.v and a <benchmark>.losscheck.1.v). <benchmark>.losscheck.0.v is the first instrumentation, which does not filter false positives (as discussed in Section 4.5.3). Our scripts run the original testbench of the circuit on the first instrumentation, and generate a list of signals that should be filtered out (i.e., storing in filter.txt). Then, our scripts invoke LossCheck again, generating the second instrumentation (i.e., <benchmark>.losscheck.1.v), with the signals in filter.txt filtered out.

For D4 and C4: This will generate a test.v file, which is the flattened design with LossCheck's instrumentation. These two bugs do not need false positive filtering. As a result, no filter.txt file will be generated.

To verify that the second instrumentation actually detects the data loss, run the following command:

make -f Makefile.lc sim

You are expected to see some error message with regard to data loss. For D2, D3, D4, C2, and C4, there should be no false positives. For D1, you are expected to see one register that's misidentified. (You will see several rows misreporting the same register.)

2.3.2 Reproducing the Resource Overhead

You can use the following command to synthesize the circuit with and without LossCheck instrumentation. Please note each synthesis can take hours.

make -f Makefile.lc synth

And use the following command to report resource utilization.

make -f Makefile.lc report_util

For D1, D2, D3, and C2, you will get something like this:

build_withlosscheck: ALM BRAM#B BRAMbit FFs
115428;775;11146672;139645
build_notask: ALM BRAM#B BRAMbit FFs
109694;413;5238192;130390

These four bugs are on the Intel HARP platform. This platform contains a vendor-provided shell and an user-implemented accelerator. Because the shell is a fixed region and is not usable by the accelerator, the resource overhead in Figure 3 is normalized to the total available resource of the accelerator region (i.e., without the shell). You may use the following data as the available resource of the accelerator-usable region.

ALM	FFs
327029	1600141

In the above example, the uninstremented accelerator uses 9523 ALMs, and the instrumented accelerator uses 15257 ALMs. As a result, the ALM (logic) overhead is (15257-9523)/327029=1.7%.

Specifically, as we mentioned in our paper, the frequency of D3 and C2 (i.e., the Optimus bugs) will be reduced from 400MHz to 200MHz after LossCheck's instrumentation. As a result, we need to add an asynchronous fifo which helps clock domain crossing. When generating the verilog files for compilation, the makefile will add the fifo.

For D4 and C4, you will get something like this:

build_withlosscheck: Total LUTs,FFs,RAMB36,RAMB18
        1435;2415;16;1
build_notask: Total LUTs,FFs,RAMB36,RAMB18
        45;83;0;0

These two bugs are on the Xilinx platform. There's no shell in the platform so the accelerator can use all resource on the FPGA. You may use the following data as the available resource.

LUT	FFs
203800	407600

3. Licenses and Terms

This artifact includes modified versions of Pyverilog (veripass/Pyverilog) and Verilator (veripass/verilator), which are released under their original licenses. Bugs in the hardware-bugbase directory are collected (and organized) from different sources, and are also released under the original licenses of the original implementation.

Our debugging tools under the veripass directory are released under the GPLv3 license, whatever it means. Please also note that these tools are academic prototypes and may not be stable, reliable, or always correct; use it at your own risk.

By downloading/cloning/forking the veripass repository, you have known and agreed to all terms included in GPLv3, and that the developers/authors of these tools will not be responsible for any of your losses and/or damages, including but not limited to the tools not working as expected and your loved ones being unhappy of you working/hacking at 3am.

If you find our work interesting, please cite our paper.

@inproceedings{ma2022debugging,
  title={Debugging in the Brave New World of Reconfigurable Hardware},
  author={Ma, Jiacheng and Zuo, Gefei and Loughlin, Kevin and Zhang, Haoyang and Quinn, Andrew and Kasikci, Baris},
  booktitle={Proceedings of the Twenty-Seventh International Conference on Architectural Support for Programming Languages and Operating Systems},
  year={2022}
}

HieronZhang/asplos22-hardware-debugging-artifact