Multistage Pipelined RV32I CPU Design

Team Members:

Johan Jino
Clemen Kok 
Shermaine Ang
Sohailul Islam Alvi

Repository Tree

Repository
    │   
    └───> main
    │      │----> README.md {Overview, Joint and Personal Statements}
    │      | 
    │      │----> rtl {Single-Cycle RISC-V}
    │      |       └───> README.md
    │      |       └───> risc_v.sv
    │      |       └───> risc_v_tb.cpp
    |      |
    │      └───> test {Single-Cycle RISC-V}
    │              └───> README.md
    │              └───> F1_program.asm
    │              └───> reference_program.asm
    │       
    │       
    │   
    └───> pipeline
    |      │----> rtl {Pipelined RISC-V}
    │      |       └───> README.md
    │      |       └───> risc_v.sv
    │      |       └───> risc_v_tb.cpp
    |      |
    │      └───> test {Pipelined RISC-V}
    │              └───> README.md
    │              └───> F1_program.asm
    │              └───> reference_program.asm
    |
    │       
    │   
    └───> cache
           │----> rtl {Pipelined RISC-V with Data Cache}
           |       └───> README.md
           |       └───> risc_v.sv
           |
           └───> test

Contribution Tables

Repository Files

File	Johan	Clemen	Shermaine	Alvi
control_unit.sv	*
pc_mux.sv			*
data_mem.sv		*		x
risc_v.sv	x			*
pc_Reg.sv			*
datacache.sv		*	*
cachebranch.sv		*	*
instr_mem.sv	*			x
sign_extend.sv	*
main/README.md	x	x	x	*
alusrc.sv		*
**/README.md				*
reg_file.sv		*
risc_v_top.cpp				*
control_unit.sv	*
alu.sv		*
F1_program.asm	x	x	x	x
sign_extend.sv	*

 * = Principal Contributor
 x = Also Helped/Worked

Pipeline Process

CPU Block	Johan	Clemen	Shermaine	Alvi
Fetch			*
Data	*
Execute		*
Memory & Write				*
Debug & Test	*	x	x	*

 * = Principal Contributor
 x = Also Helped/Worked

Joint Statement of Contributions

The RISC-V CPU was initially divided into four components based on the structure the team had used in Lab 4. Each member was assigned a specific area of contribution based on the various stages of the project, as stated below. The table below summarizes the tasks carried out by each member of the team throughout the coursework tenure; all points stated here are acknowledged and agreed by all the team members.

Member	GH Username	Tasks	Elaboration
Shermaine Ang	notmaineyy	Program Counter & Adders (Single-Cycle CPU) FETCH Block (Pipelined CPU) Data Cache Documentations	1. Single-Cycle CPU (Lab 4): Prepared HDL for program counter and relevant adders. 2. Single-Cycle CPU (Lab 4): Created testbench for program counter and adders, for testing of individual component. 3. Assisted with deadline setting, to keep everyone in the team in the loop, and ensuring meetups are arranged. 4. Pipeline Stage: Worked with Clemen on initial idea for Pipelining - Added DFF for PCPlus4D and PCD 5. Worked on the HDL needed for the FETCH block in new approach, and assisted with integration with the rest of the components. 6. Added HDL needed for JALR in FETCH block. 7. Read up on Data Cache 8. Data Cache: Worked on HDL for Data Cache 9. Contributed to documentation.
Johan Jino	johanjino	GitHub Actions and CI/CD Pipeline Control Unit (Single-Cycle CPU) DATA Block (Pipelined CPU)	1. Created GitHub Actions to maintain automated tests and ensure stability of the repo by preventing errors to be merged 2. Worked on overall assembly of the CPU and its verification 3. Single-Cycle CPU: Prepared HDL for the control unit and integreted all units. 4. Pipeline Stage: Defined HDL for the DATA block and integrated together all the blocks. 5. Worked to the reference program execution and inference. 6. Documentation for the above
Clemen Kok	clemenkok	Project Management ALU (Single-Cycle CPU) EXECUTE Phase (Pipelined CPU)	1. Created the Repo and coordinated meetings with the team. 2. Set up and maintained a Gantt Chart so that each team member would be clear on the team's objectives at the various phases of the project. 3. Single-Cycle CPU: Prepared HDL needed for ALU and tested it with testbench. 4. Came up with and worked with team to implement assembly code needed to test Single-Cycle CPU. 5. Pipeline Stage: performed an initial exploration into how the team should approach Pipelining. Implemented HDL across all components on Single-Cycle CPU. 6. Realised approach would lead to bugs that would be hard to find (mostly reusing old code). Prepared the HDL needed for the EXECUTE phase, and helped to integrate it with the rest of the components. 7. Came up with approach for Data Cache implementation and worked on HDL as well as data retrieval. 8. Gave pointers on how others could improve their code; contributed to documentation. 9. Debugged reference program and got it to work by adding HDL for `add` and `lui` instructions.
Sohailul Islam Alvi	alvi-codes	Top-Level SV Module, C++ Testbench, Debug and Test, Memory Block, Execute Block, Documentations	1. Assembled the overall CPU, by writing the top-level `risc_v.sv`, combining the components built by my team mates. 2. Wrote the C++ testbench required to trace outputs and verify the design. 3. Debugged and tested the overall design to ensure the outputs produced are correct in waveforms. 4. Modified the testbenches to run the test programs on VBuddy, and recorded the outputs. 5. Wrote the HDL for the MEMORY and EXECUTE blocks of the pipelined design. 6. Worked with the team to integrate all blocks in the pipelined design. 7. Updated memory allocations in the CPU designs to adhere to the given memory map. 8. Debugged the reference program and worked to implement the HDL for `add` and `lui` instructions. 9. Created the documentations in the `test` and `rtl` folders of the main and pipeline branhces, along with structuring the root `README.md` in the main branch that overviews the complete coursework.

Component verification at each stage was done during group meetings where each member could give feedback and do live troubleshooting. In addition, improvements to other members' contributions was communicated in the team chat. The Project Gantt Chart can be seen below:

Brief Overview of Objectives

Objective 1:

F1 Starting Light Algorithm in RV32I Assembly Language

The team met to discuss methods of implenting the F1 Starting Light Algorithm in RISC-V Asslembly Language, and decied to follow a method where lb will be used to load data into a0. This is a straightforward implementation that would enable the algorithm to start immediately upon trigger/reset. Trigger and Reset have the same functionality given that they both start the program. Also, no interrupts are implemented at this stage. The final assembly language program generated by the team can be found at test/F1_program.asm in the main branch.

Objective 2:

Single-Cycle RV32I Microarchitecture

Using the works done to complete the Lab 4, the team members developed and verified the Single-Cycle CPU design, based on the designs found in the lecture slides. The work distribution and design details for the Single-Cycle RISC-V can be found at rtl/README.md in the main branch. The assembly program developed previously has been used to test and verify our design's functionality. Test results of the F1 assembly program, in terms of waveforms and video of the output implemented on VBuddy, can be found at test/README.md in the main branch.

Objective 3:

Pipelined RV32I Microarchitecture

The team decided to use the Pipelined CPU design found in the lecture slides, (Lecture 8, slide 5) as a base to start with. The design was split into 4 RTL Design blocks and each member had to write the HDLs for their blocks and commit those into the rtl folder in the pipeline branch. The work distribution and design details for the Pipelined RISC-V can be found at rtl/README.md in the pipeline branch. The F1 assembly program had to be modified with 2 added nop instructions after each jal, jalr and beq instructions; which would compromise for the delays between each design block caused by the DFFs in between. Successful test results of the modified F1 assembly program, in terms of waveforms and video of the output implemented on VBuddy, can be found at test/README.md in the pipeline branch.

Objective 4:

Pipelined RV32I Microarchitecture with Data Cache

At this stage, the team split into clusters to focus on the major remaining aspects of the coursework. With regular discussions and meetings being held, Clemen and Shermaine started working on the Data Cache implementaion, whereas Johan and Alvi worked to memory-map the CPU design, and implement the new byte instructions to get the reference program working. Clemen and Shermaine met and decided to implement the Data Cache based on the Direct Mapped Cache Hardware found in the lecture slides. The approach was to add the direct mapped cache hardware in parallel to the current Data Memory block. There will then be a multiplexer that chooses the data, based on the value of Hit, to send into the DFF block between the Memory and Write processes. The design details for the Pipelined RISC-V with Data Cache can be found at rtl/README.md in the cache branch.

Personal Statements

Johan Jino

Single-Cycle CPU:

I worked on the Control Unit to add working of all instructions. This also inculuded the sign extenstion unit. Futher went ahead to implement the JAL, JALR and BYTE instructions. Individual test benches where made to test every component in each block. Later, I worked on integrating all the components and fixing any bugs if present along with Alvi. Most of the commits for our single cycle are present in the Lab 4 Repository.

Pipelined CPU:

First we organised a plan to work on the pipeline. Since many signals of the control were a little different, we decided to start the cpu from scratch with a whole new layout for JAL and JALR. Then I split the work block wise for each team member. My work on the data block can be seen on this commit DATA BLOCK. After all blocks were implemented I and Alvi then focused on verifying its functionality with the test program as well as the program provided by Professor Cheung.

Data Cache:

At the last stage of the coursework completion, Alvi and I focused onto getting the reference program working on our CPU and doing all the necessary modifications needed, along with memory mapping our CPU designs according to the given memory diagram. I also worked on adding the byte instructions to the all the other branches as per the reference program needed Byte Instructions and completed pipeline commit. At this stage, regular team meetings were done to update each other regarding our works and do the necessary helps and exchange of ideas for the benefit of the coursework success; several joint contributions were done at this stage. Shermaine and Clemen focused on the Data Cache implementation, and updated us on the findings and new learnings along the way. Also, recorded vidoes of the working CPU run on Vubuddy and uploaded them to the respective test folder README.md files Video reference.

Clemen Kok

Single-Cycle CPU:

I worked on the ALU component of the CPU. To acomplish this, I prepared the HDL needed for the ALU and tested it with my own C++ testbench. I used Professor Cheung's notes on the ALU to verify that the HDL that I had prepared was functional. Then, I pushed it to the repository (commits are located in our Lab 4 Repository). Alvi and Johan assisted with integrating it and testing that it worked with the assembly program that we came up with. I helped to debug the test for the reference program and realised that we had to add the add and lui instructions (which was why it was not working). Alvi and I made the changes.

Pipelined CPU:

I looked ahead and figured out the team's approach for pipelining. This involved modifying the team's code for a pipelined CPU. After I realised that this would be quite buggy, I raised it up to the team. Johan came up with the idea of splitting the CPU into 4 components - FETCH, DECODE, EXECUTE, MEMORY - which we then proceeded to do. I prepared the EXECUTE component of the pipelined CPU. Johan and Alvi then focused on verifying its functionality with the test program as well as the program provided by Professor Cheung. Shermaine and I then focused on the implementation of Data Cache. However, as the team realised that we had to concentrate on implementing the SB and LBU instructions, we proceeded to stop working on data cache. I suggested that given the limited time, we focus on doing Byte instructions where we would Store / Load the LSB (Byte instructions would be implemented through the Control Unit). Johan then assisted with the implementation and testing.

Data Cache:

I proposed an approach for Data Cache during my discussion with Shermaine. Shermaine prepared the HDL needed for this approach, while I helped to improve it. While Johan worked on the Byte Instructions, I helped to develop the HDL for Data Cache further. The team could have completed the verification for the Data Cache should we have had more time.

Shermaine Ang

Single-Cycle CPU:

I worked on the Program Counter (PC) and the relevant adders of the CPU. I created the HDL needed for the PC to work and also created a top level SystemVerilog file to combine all the hardware components created for the PC. Before passing the work to Alvi for testing, I created a C++ testbench for the PC to verify if the PC was working as it should. It was working on its own, and then passed it on to Alvi to combine all the separate components together.
During the debugging phase, it was observed that the Single-Cycle CPU was not working as it should, and that there were some errors in the PC. The multiplexer that I had created had a clock, which was incorrect. My team members helped me fixed this error (as I was unwell). From this, I realised how important it was to work as a team, as there are times when I would miss out errors and other members would be able to help me find the errors that I have made, and work together to ensure that the entire thing works as it should be.
Moreover, Clemen had also told me an alternative to creating a whole block for a multiplexer. I could have used tenary statements in the top level file instead, which will simplify the creation and debugging processes. With this in mind, I later changed the code in the Pipelined CPU.
Just few hours before the deadline, the team was debugging the code as there were missing instructions. After Alvi and Clemen added HDL for add and lui instructions, we encountered another issue that a0 remained at 0 throughout the 10000 cycles. While voicing out what the expected results is supposed to be, we realised that the number of cycles was insufficient. t1 did not reach max and hence the program was not leaving loop2, we then increased the number of cycles, and the results were as expected. Reflection -- Rubber duck debugging eases our debugging process!

Pipelined CPU:

I helped implement the FETCH block for Clemen's initial approach but after realising that this was quite buggy, we met up again to think of a new approach. Our team split the pipelining equally amongst ourselves, and I was in charge of the FETCH block, as it had components from the PC, which was what I had created earlier on. Remembering what I had learnt from Lab 4, instead of creating a whole block for a multiplexer, I used tenary statements instead in the top level file, which turned out to be a lot simpler.
During one of our meetings, we were implementing the Jump instructions, and I really enjoyed working as a team, as we all worked on different parts together, and I could ask my team members any questions I had. I was in charge of implementing a multiplexer within the FETCH block, and worked with Alvi and Johan as they were implementing hardwares in their respective blocks. Communicating with my team members in person made the implementation go really smoothly, as we could check with one another as soon as we had any issues.
When debugging the Pipelined CPU, our team supported one another to look out for errors, as mentioned previously, some might miss certain bugs, and these could be detected by others in the team. We eventually managed to debug our Pipelined CPU quite quickly with the help of everyone in the team.
Other than the HDL required, I also contributed to the documentation, ensuring that members are aware of what we have to do, and the respective deadlines. Clemen started doing this, and I felt that it was an important part, especially in a group setting, and hence, added on to the documentation whenever we had any discussions or set any deadlines for ourselves.

Data Cache:

Using Clemen's proposal of the approach for Data Cache, I reread the lecture notes to have a better understanding of how data cache worked. I prepared the HDL required for the approach, which was later further improved by Clemen. We, however, did not continue implementing the data cache as we encountered other pressing problems, hence, testing has not been done for the data cache, but could have been completed if given more time.

Sohailul Islam Alvi

Single-Cycle CPU:

I combined all my fellow group mates' works into a complete form, to assemble the fully functional Single-Cycle RISC-V CPU. I also created the top level SystemVerilog file, and the C++ testbench to verify the correctness of our F1 program running on our CPU; along with debugging and doing all the required modifications at each stage. Commits made upto this stage are all present in the Lab 4 Repository. With that, I have traced the outputs via a0 from our CPU, onto the waveform viewer, implemented the program outputs into the VBuddy and have recorded the output behaviour, as seen in the test/readme.md. I have also helped in implementing the ADD and LUI instructions to get the reference program working on our CPU design which was comitted by Clemen.

Pipelined CPU:

For the pipelining process, I have been resposible for the MEMORY and WRITE blocks of the CPU. Afterwards, I also implemented the wirings for the JALR instruction across the top-level module of the Pipelined CPU design. Then, I debugged and verified the design and its workability, along with Johan to ensure our Pipelined CPU is fully functional in terms of our F1 Program and the reference program provided; adding the necessary NOP instructions where needed. The assembly programs used, along with the test output waveforms and behaviour recordings on VBuddy are all available in the test/readme.md file.

Data Cache:

At the last stage of the coursework completion, Johan and I focused onto getting the reference program working on our CPU and doing all the necessary modifications needed, along with memory mapping our CPU designs according to the given memory diagram. I also structured and wrote the main branch README.md as per the coursework guidelines, and set up the base for my team mates to write their parts as well. At this stage, regular team meetings were done to update each other regarding our works and do the necessary helps and exchange of ideas for the benefit of the coursework success; several joint contributions were done at this stage. Shermaine and Clemen focused on the Data Cache implementation, and updated us on the findings and new learnings along the way.

alvi-codes/RiskyCPU

Multistage Pipelined RV32I CPU Design

Contribution Tables

Joint Statement of Contributions

Brief Overview of Objectives

Objective 1:

Objective 2:

Objective 3:

Objective 4:

Personal Statements