This repository contains the evaluation materials for the PLDI 2021 paper, "Reticle: A Virtual Machine for Programming Modern FPGAs". The repository can be found on github. (If you're reading these instruction in a text file, we recommend visiting the link above to see a markdown-formatted version.)
The central claims of the Reticle paper all build on the idea that the traditional synthesis tools are inadequate for compiling designs for modern FPGAs. To improve this situation, we develop a new language IR and compiler to handle FPGA synthesis. Specifically, compared to the synthesis flows of Xilinx Vivado, Reticle:
- Claim A: Is faster (up to 100x) synthesizing the same design;
- Claim B: Is more predictable;
- Claim C: More efficiently exploits DSPs;
- Claim D: Produces potentially faster designs from better DSP usage;
- Claim E: Is extensible for new instructions/resources.
These claims are all supported by the artifact. Claims A - D are discussed after generating comparison data for the example programs in the "Step-by-Step Guide". Claim E is substantiated with the exercise at the end of the "Getting Started" guide (adding a new instruction to the compiler).
This section will help you set up the virtual machine, install Reticle, and then test out some features of the Reticle system. It should take less than 30 minutes to complete.
The VM is packaged as an OVA file and can be downloaded from Google Drive. Our instructions assume you're using VirtualBox.
Download the VM image from Google Drive. You can verify its integrity by checking that the MD5 checksum matches d8dc7563f18e3f030d1ed7e5d146f82c
.
Once downloaded, import the OVA image into VirtualBox with the "Import Appliance" option under the "File" menu.
The minimum host disk space required to install external tools is 65 GB. Furthermore, to to ensure that the Xilinx tools do not crash from resource exhaustion, we recommend increasing the number of cores and RAM allocated to the VM in VirtualBox:
- Select the VM and click "Settings".
- Select "System" > "Motherboard" and increase the "Base Memory" to 8 GB.
- Select "System" > "Processor" and select at least 2 cores.
Launch the VM from the sidebar in Virtualbox. The username is reticle and the password is reticle
.
Common VM problems [click to expand]
- Running out of disk space while installing Vivado tools. The Vivado installer will sometimes crash or not start if there is not enough disk space. The Virtual Machine is configured to use a dynamically sized disk, so to solve this problem, simply clear space on the host machine. You need about 65 GB of free space.
- Running out of memory. Vivado uses fair amount of memory. If there is not enough memory available to the VM, they will crash and data won't be generated. If something fails, you should increase the RAM and re-try the script that had a failure.
- Kernel driver not installed (error "rc=-1908"). If you're running VirtualBox for the first time on OSX, you may need to set proper permissions.
- Open a terminal.
- Run
cd ~/Desktop/reticle-evaluation
to go to the evaluation repository. - Update the repository by running
git pull
. - Run
bash scripts/install_reticle.sh
, which will build and install Reticle.
If you would like to view these instructions from within your VM for simpler back-and-forth viewing (and to simplify copying and pasting commands), we recommend opening the github link to see the formatted markdown from within the VM.
You can do this by opening the README with less README.md
, right-clicking the github link, and selecting "open link".
In this section, we will use a short Python script to generate Reticle program, synthesize them, and then output structural Verilog.
The program scripts/generator.py
will produce Reticle programs that perform a tensor add. It can be run with the following options:
-p [lut-scalar | dsp-vector]
specifies whether to annotate the Reticle instructions for LUT or DSP usage;-l N
indicates how long the tensors are;-o file
writes the output Reticle program to the indicated file.
We recommend comparing LUT vs DSP usage for the same sized program with the following:
python3 scripts/generator.py -p lut-scalar -l 8 -o lut.ir
to generate a program using LUTs, and
python3 scripts/generator.py -p dsp-vector -l 8 -o dsp.ir
to generate a program using DSPs.
Feel free to generate some other Reticle programs with different options and take a look at the output. Note that the DSP version of the program takes advantage of vectorization to pack four 8-bit additions into each DSP.
Next, we'll use the reticle-translate
command to synthesize each Reticle program and output structural Verilog. To synthesize the two programs we just generated, run
reticle-translate lut.ir --fromto ir-to-struct -o lut.v
and
reticle-translate dsp.ir --fromto ir-to-struct -o dsp.v
.
Take a look at the generated Verilog, but be warned --- it's very low-level! At this point, all operations have been synthesized to primitive FPGA instructions; the "compute" operations in the Verilog are either configuring a LUT or DSP.
In this section you will add a new instruction to a target description and compile a program using the new instruction.
Reticle uses .tdl
files ("target description language") to specify instructions for a given target. Each instruction needs two things:
- a pattern that can be used in the IR, and
- an implementation in low-level Reticle assembly.
We will add a "subtract" instruction to the target that uses a LUT.
- Open the file located in
~/Desktop/reticle-evaluation/reticle/examples/target/ultrascale/lut.tdl
- Add the pattern (pat) and implementation (imp) as:
pat lut_sub_i8[lut, 1, 2](a: i8, b: i8, en: bool) -> (y: i8) {
y:i8 = sub(a, b);
}
imp lut_sub_i8[x, y](a: i8, b: i8, en: bool) -> (y: i8) {
t0:bool = ext[0](a);
t1:bool = ext[1](a);
t2:bool = ext[2](a);
t3:bool = ext[3](a);
t4:bool = ext[4](a);
t5:bool = ext[5](a);
t6:bool = ext[6](a);
t7:bool = ext[7](a);
t8:bool = ext[0](b);
t9:bool = ext[1](b);
t10:bool = ext[2](b);
t11:bool = ext[3](b);
t12:bool = ext[4](b);
t13:bool = ext[5](b);
t14:bool = ext[6](b);
t15:bool = ext[7](b);
t16:bool = lut2[tbl=9](t0, t8) @a6lut(x, y);
t17:bool = lut2[tbl=9](t1, t9) @b6lut(x, y);
t18:bool = lut2[tbl=9](t2, t10) @c6lut(x, y);
t19:bool = lut2[tbl=9](t3, t11) @d6lut(x, y);
t20:bool = lut2[tbl=9](t4, t12) @e6lut(x, y);
t21:bool = lut2[tbl=9](t5, t13) @f6lut(x, y);
t22:bool = lut2[tbl=9](t6, t14) @g6lut(x, y);
t23:bool = lut2[tbl=9](t7, t15) @h6lut(x, y);
t24:bool = vcc();
t25:bool = gnd();
t26:i8 = cat(t16,t17,t18,t19,t20,t21,t22,t23);
y:i8 = carry(a, t26, t24, t25) @carry8(x, y);
}
- Build and install Reticle to apply the changes:
cd ~/Desktop/reticle-evaluation/reticle
cargo build --release
cargo install --bin reticle-translate --bin reticle-optimize --bin reticle-place --path .
cd ..
- Create a program that uses the
sub
instruction and save it asprog.ir
:
def main(a: i8, b: i8, en: bool) -> (y: i8) {
y:i8 = sub(a, b);
}
- Compile the program to structural verilog:
reticle-translate prog.ir --fromto ir-to-struct -o prog.v
- Open the resulting Verilog file to see the result.
- Claim E: Reticle is extensible for new instructions/resources
Because Reticle is built around an abstraction of instructions, rather transistors or basic logic, its compiler infrastructure can easily express, target, and compile to new FPGA units without resorting to complex inference or unbounded logic synthesis.
The goal of this section is to reproduce all of the evaluation figures included in the paper.
Our evaluation uses Xilinx's Vivado tools to generate area and resource estimates. Unfortunately, due to licensing restrictions, we can't distribute the VM with these tools installed. However, the tools are freely available and below are instructions on how to install them.
Our evaluation requires Vivado WebPACK v.2020.1. Due to the instability of synthesis tools, we cannot guarantee our evaluation works with a newer or older version of the Vivado tools.
If you're installing the tools on your own machine instead of the VM, you can download the Linux installer.
The following instructions assume you're using the supplied VM:
- Log in to the VM with the username
reticle
and the passwordreticle
. - The desktop should have a file named:
Xilinx Installer
. Double click on this to launch the installer. - Ignore the warning and press
Launch Anyway
. - When the box pops up asking you for a new version, click
Continue
. - Enter your Xilinx credentials. If you don't have them, create a Xilinx account.
- Note When you create an account, you need to fill out all the required information on your profile.
Otherwise the Xilinx installer will reject your login.
- The "User ID" is the email address of the Xilinx account you created.
- Agree to the contract and press
Next
. - Choose
Vivado
and clickNext
. - Choose
Vivado HL WebPACK
and clickNext
. - Leave the defaults for selecting devices and click
Next
. - Important! Change the install path from
/tools/Xilinx
to/home/reticle/Xilinx
. - Confirm that you want to create the directory.
- Install. Depending on the speed of your connection, the whole process should take about 2 - 4 hrs.
Because we modified part of the target description files in the "Getting Started" guide, this may affect the results. To restore the artifact version of the compiler, re-run bash scripts/install_reticle.sh
from the reticle-evaluation
directory.
Note: this will delete all changes you may have made to Reticle and all files in the reticle
directory.
Within the reticle-evaluation
repo, the script analysis/artifact.ipynb
will generate all the figures from the paper. First try running it on the existing data with
jupyter lab analysis/artifact.ipynb
.
Scroll to the bottom to see the figures.
The following commands generate the data used by the Jupyter notebook plotting script to plot the figures.
To re-create all the data:
- Generate the compiler run-time data with
python3 scripts/compiler_runtime.py
- Generate the program run-time and resource usage data with
python3 scripts/program_metrics.py
- The new data files (csv) should be in
data
directory. Rungit diff
to see the changes in the new data files compared to the committed ones.
Run jupyter lab analysis/artifact.ipynb
to see the graphs.
For the following, please refer to the figures generated for the tensoradd
example (those following the line plot_prog("tadd")
in the Jupyter notebook).
- Claim A: Reticle is faster (up to 100x) synthesizing the same design
The first graph shows how long the Reticle compiler took compared to Xilinx Vivado. Note that we are only comparing synthesis time (not placement and routing). Reticle always runs faster than Vivado, outperforming it by over 100x in the best case. We see similar results for the other two examples (tensordot
andfsm
). Notably, using Verilog with DSP hints (the middle case, labelled "hint") actually slows down Vivado compared to the baseline for this example. - Claim B: Reticle is more predictable
The last two graphs in the row display the LUT resources used and the DSP resources used for each case. All three cases use more resources as we make the design larger; however, the Reticle case only uses DSPs (its entry in the LUT graph is zero for all sizes). The "hint" case (Verilog + DSP hints in Vivado), despite the annotation to use DSPs, still resorts to using LUTs in some cases --- this is because such annotations are treated as suggestions, not constraints. When one uses a particular annotation in Reticle (LUT vs DSP), it will always synthesize to the indicated resource (or fail if unavailable). - Claim C: Reticle more efficiently exploits DSPs by using value types
The final graph ("DSPs used") shows that Reticle uses fewer DSPs than Vivado, despite synthesizing the same design. This follows from Reticle's inclusion of value types, allowing native expression of vectors and exploitation of DSP vectorization capabilities. - Claim D: Reticle produces potentially faster designs from better DSP usage
It is well understood that DSPs can be faster than LUTs for the same computation. The second graph ("Run-time speedup") shows the runtime performance of the design for each case post-synthesis. Depending on example and design size, Reticle produces a design that is at, near, or faster than the one produced by Vivado --- even when using annotated Verilog.