4-stage RISC-V Core

This repository contains all the information and codes of the 4-stage Pipelined RISC-V Core designed during the RISC-V MYTH Workshop. The core supports the RV32I Base Integer Instruction Set and it is developed in TL-Verilog using Makerchip.

Introduction To RISC-V ISA
Compiler Toolchain
Application Binary Interface
RTL Design Using TL-Verilog and MakerChip
Basic RISC-V Core
Pipelined RISC-V Core
Final 4-Stage RISC-V Core
- Final RISC-V Core
- Code Comparison
Future Work
References
Acknowledgement

Introduction To RISC-V ISA

RISC-V is a new ISA that's available under open, free and non-restrictive licences. RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture.

Why RISC-V?

Far simple and smaller than commercial ISAs.
Avoids micro-architecture or technology dependent features.
Small standard base ISA.
Multiple Standard Extensions.
Variable-length instruction encoding

For more information about RISC-V ISA

Compiler Toolchain

Toolchain simply is a set of tools used to compile a piece of code to produce a executable program. Similar to other ISAs RISC-V also has its own toolchain. Mentioned below are steps to use RISC-V toolchain

Using RISC-V Complier:

riscv64-unknown-elf-gcc -<compiler options> -mabi=<ABI options> -march=<Architecture options> -o <object filename> <C Program filename>

<compiler options> : O1, Ofast
<ABI options> : lp64, lp32
<Architecture options>: RV64, RV32

Viewing the assembly language code:

riscv64-unknown-elf-objdump -d <object filename>

Simulating the object file using SPIKE simulator:

spike pk <object filename>

Debugging the object file using SPIKE:

spike -d pk <object Filename>

Below images show the toolchain implementation for a small c program for sum of first 9 positive integers.

RISC-V Toolchain: Compilation, Simulation and Debugging
Viewing the assembly language code for generated object file.

Application Binary Interface

Every application program runs in a particular environment, which "Application Execution Environment". How the application interfaces with the underlying execution environment is called the "Application Binary Interface (ABI)".

The Application Binary Interface is the sum total of what the application programmer needs to understand in order to write programs; the programmer does not have to understand or know what is going on within the Application Execution Environment.

An Application Binary Interface would combine the processor ISA along with the OS system-call interface. The below snippet gives the list of registers, thier short description and ABI name of every register in RISC-V ISA.

RTL Design Using TL-Verilog and Makerchip

Makerchip is a free online environment for developing high-quality integrated circuits. You can code, compile, simulate, and debug Verilog designs, all from your browser. Your code, block diagrams, and waveforms are tightly integrated.

Following are some unique features of TL-Verilog:

Supports "Timing Abstraction"
Easy Pipelining
TL-Verilog removes the need always blocks, flip-flops.
Compiler available converts TL-Verilog to Verilog, which can be easily synthesized.

Designing a Simple Calculator

A simple implementation of a single stage basic calculator is done in TL-Verilog. The calculator will have two 32-bit input data and one 3-bit opcode. Depending upon the opcode value, calculator operation is selected.

The below snippet shows the implementation in Makerchip. Here all the working of the calculator is done in a single stage.

Pipelining the Calculator

The simple calculator developed above is pipelined using TL-Verilog. It seems very easy in TL-Verilog. No need of always_ff @ (clk) or any flip-flops, the pipelining can be done just by using |calc for defining pipeline and @1 or @2 for writing stages of pipeline.

The below snippet shows that in the pipeline Stage-1 is used for accepting inputs and Stage-2 for arithmetic operations.

Adding Validity to Calculator

TL-Verilog supports a very unique feature called validity. Using validity, we can define tha condition when a specific signal will hold a valid content. The validity condition is written using ?$valid_variable_name.

The below snippet shows the implementation of validity. The calculator operation will only be carried out when there is no reset and it is a valid cycle.

The detailed TL-Verilog code for the calculator can be found here

Basic RISC-V Core

This section will cover the implementation of a simple 3-stage RISC-V Core / CPU. The 3-stages broadly are: Fetch, Decode and Execute. The diagram below is the basic block of the CPU core.

Program Counter and Instruction Fetch

Program Counter, also called as Instruction Pointer is a block which contains the address of the next instruction to be executed. It is feed to the instruction memory, which in turn gives out the instruction to be executed. The program counter is incremented by 4, every valid iteration. The output of the program counter is used for fetching an instruction from the instruction memory. The instruction memory gives out a 32-bit instruction depending upon the input address. The below snippet shows the Program Counter and Instruction Fetch Implementation in Makerchip.

Instruction Decode and Read Register File

The 32-bit fetched instruction has to be decoded first to determine the operation to be performed and the source / destination address. Instruction Type is first identified on the opcode bits of instruction. The instruction type can R, I, S, B, U, J. Every instruction has a fixed format defined in the RISC-V ISA. Depending on the formats, the following fields are determined:

opcode, funct3, funct7 -> Specifies the Operation
imm -> Immediate values / Offsets
rs1, rs2 -> Source register index
rd -> Destination register index

Generally, RISC-V ISA provides 32 Register each of width = XLEN (for example, XLEN = 32 for RV32) Here, the register file used allows 2 - reads and 1 - write simultaneously.

The below snippet shows the Decode and Read Register Implementation in Makerchip.

Execute Instruction and Write Register File

Depending upon the decoded operation, the instruction is executed. Arithmetic and Logical Unit (ALU) used if required. If the instruction is a branching instruction the target branch address is computed separately. After the instruction is executed, the result of stored back to the Register File, depending upon the destination register index. The below snippet shows the Instruction Execute and Write Register File Implementation in Makerchip.

The code for the 3-stage simple RISC-V Core can be found here

Pipelined RISC-V Core

Pipelining processes increases the overall performance of the system. Thus, the previously designed cores can be pipelined. The "Timing Abstraction" feature of TL-Verilog makes it easy.

Pipelining the Core

Pipelining in TL-Verilog can be done in following way:

|<pipe_name>
@<pipe_stage>
   Instructions present in this stage
@<pipe_stage>
   Instructions present in this stage

There are various hazards to be taken into consideration while implementing a pipelined design. Some of hazards taken under consideration are:

Improper Updating of Program Counter (PC)
Read-before-Write Hazard

Load and Store Data

A Data memory can be added to the Core. The Load-Store operations will add up a new stage to the core. Thus, making it now a 4-Stage Core / CPU.

The proper functioning of the RISC-V core can be ensured by introducing some testcases to the code. For example, if program for summation of positive integers from 1 to 9 and storing it to specific register can be verified by:

  *passed = |cpu/xreg[17]>>5$value == (1+2+3+4+5+6+7+8+9);

Here, xreg[17] is the register holding the final result.

Final 4-Stage RISC-V Core

After pipelining is proved in simulations, the operations for Jump Instructions are added. Also, added Instruction Decode and ALU Implementation for RV32I Base Integer Instruction Set.

The snippet below shows the successful implementation of 4-stage RISC-V Core

The complete TL-Verilog code for 4-Stage RISC-V Core can be found here

Final RISC-V Core

Code Comparison

The SandPiper Compiler generated ~90,000 characters of SystemVerilog from ~25,000 characters of TL-Verilog. Among the ~90,000 characters of SystemVerilog, only ~18,000 is actual logic.

The snippet below shows the code comparison of TL-Verilog and SystemVerilog.

Future Work

The implemented core only supports the RV32I Base Integer Instruction Set. The design will further be modified for adding supports for all other operations and extensions like C, M, F.

The decompress and decode logic for RV32C (extension C) is already been added to the design. The code can be found here

References

RISC-V ISA Manual: https://github.com/riscv/riscv-isa-manual/
RISC-V: https://riscv.org/
Makerchip : https://makerchip.com/
VLSI System Design: https://www.vlsisystemdesign.com/

Acknowledgement

Kunal Ghosh, Co-founder, VSD Corp. Pvt. Ltd.
Steve Hoover, Founder, Redwood EDA

ShonTaware/RISC-V_Core_4_Stage