- [1] https://www.acri.c.titech.ac.jp/wordpress/archives/6048
- [2] https://www.acri.c.titech.ac.jp/wordpress/archives/6767
- [3] https://www.acri.c.titech.ac.jp/wordpress/archives/7284
- [4] https://www.acri.c.titech.ac.jp/wordpress/archives/8036
- [5] https://www.acri.c.titech.ac.jp/wordpress/archives/8070
RVCoreP is a five-stage pipelined RISC-V soft processor developed by Miyazaki et al. at Tokyo Institute of Technology.
Though RVCoreP is very fast, it implements both instruction memory and data memory using FPGA on-chip block RAMs (BRAMs). Since the total BRAM capacity on a typical FPGA is very limited (from several hundreds kilobytes to several megabytes), RVCoreP can execute only small applications.
In this project, we enhance the capability of RVCoreP by implementing the data memory using off-chip DRAM. For this purpose, besides the logic for the DRAM controller as well as for integrating the processor core with the DRAM controller, we have modified the processor core as follows:
- We add a stall signal that is asserted when data memory is accessed. The original processor core does not have this signal since accessing BRAMs takes only one clock cycle, which does not stall the pipeline.
- We add a read enable signal for the data memory. This signal is required because of the difference in the length of the memory access stage in the cases of loading (multiple clock cycles) and not loading (one clock cycle) data from data memory. In the original processor core, regardless of whether or not the data memory is accessed, the length of the memory access stage is always one clock cycle; and therefore, only write enable signal is required for correct operation.
We provide some script files for synthesizing the design using Vivado in batch mode (command line mode).
We also add Verilog code for simulating the DRAM-based design in which we emulate the behavior of DRAM. This has been shown to be very effective in the development of the project.
- FPGA board: Arty A7-35T
- Synthesis: Vivado 2019.2
- Simulation: Synopsys VCS
- OS: Ubuntu
clk_wiz_1/
,common/
,dram/
: DRAM controller implementationsim/
,Makefile
,simsrc
: for simulationconfig.vh
defines some parameters for the design (see later explanation)data_memory.v
: data memory implementation using DRAMproc.v
: processor core implementationuart.v
: implementation for program loader and serial port communicationmain.v
: top module of the designconstraints_io.xdc
,constraints_timing.xdc
: constraints for I/O ports and timing of the design; the timing constraints are used only at the implementation stagevivado.sh
,vivado.tcl
,vivado_slurm.sh
: scripts for synthesizing the design using Vivado in batch modeverification/
: verification programs*.bin
files are for executing on an FPGA board*.mem
files are for simulation
MEM_FILE
: path to the verification program (this parameter is used only in simulation)MEM_SIZE
: must be set appropriately according to the size of the verification programs- Verification programs in
verification/test/
:MEM_SIZE
should be set to 1024*4 (4KB) - Verification programs in
verification/bench/
:MEM_SIZE
should be set to 1024*32 (32KB) - Verification programs in
verification/embench/
:MEM_SIZE
should be set to 1024*64 (64KB)
- Verification programs in
SERIAL_WCNT
: must be set appropriately according to the frequency of the clock for the processor core and the desired baud rate for the serial port. For example, if the frequency of the clock for the processor core is 100MHz and the desired baud rate is 1MegaBaud,SERIAL_WCNT
should be set to 100.
-
In development environments with the Slurm workload mamanger:
./vivado_slurm.sh <#threads> <walltime(hour(s))> <vivado version (e.g., 20183, 20191, etc.)>
For example, the command
./vivado_slurm.sh 8 3 20192
will create a job for synthesizing the design with Vivado 2019.2 in maximum 3 hours using 8 parallel threads.
Note: you may need to edit the Vivado installation path in
vivado_slurm.sh
before using it. -
In development environments without the Slurm workload manager:
./vivado.sh <#threads> <vivado version (e.g., 20183, 20191, etc.)>
For example, the command
./vivado.sh 8 20192
will synthesize the design with Vivado 2019.2 using 8 parallel threads.
Note: similar to the
vivado_slurm.sh
script, you may need to edit the Vivado installation path.
Program the FPGA and execute a verification program: after programming the FPGA, use the serial_rvcorep.py
python script in verification/
to send a verification program to the FPGA (pySerial is required to run this script). For example, the command
python3 serial_rvcorep.py 1 test/test.bin
will send the test/test.bin
program to the FPGA via the serial port at a baud rate of 1MegaBaud (you may need to change the location of the serial port in line 4 of the script). The baud rate specified here must be the same as that assumed when setting the parameter SERIAL_WCNT
in config.vh
, which is described above.
In the current design, to execute another verification program, it is necessary to reprogram the FPGA.
The frequency of the clock for the processor core is currently set to 100MHz. This clock is generated by the clocking wizard IP core in dram/clk_wiz_0/
. This IP core takes as input a no buffer 83.333MHz clock which is output by the DRAM controller and generates a 100MHz clock. You can change the output clock frequency of the IP core but the specification of the input clock (no buffer, 83.333MHz) cannot be changed (it is related to the settings of the MIG IP core in the DRAM controller).
We generate the clocking wizad IP core in dram/clk_wiz_0/
using the scripts in here (see README for the usage of the scripts).
./clkwiz.sh aa35 n 83.333 100.000 20192
The IP core can also be generated using Vivado in GUI mode.
make
make run
The simulation of executing the verification programs in verification/bench/
and verification/embench/
(especially those in the latter) is very time-consuming. For example, the execution of verification/embench/wikisort.mem
takes more than 30 minutes in our environment (Core i9 9900K CPU with 64GB DDR4 memory).