darklife/darkriscv

Something about riscv-tests from https://github.com/riscv

hyf6661669 opened this issue · 3 comments

I'm not a native English speaker so I am sorry for my poor expression :).

I have tested darkriscv by running test codes from https://github.com/riscv/riscv-tests/tree/master/isa/rv32ui. I think there are 2 bugs in current RTL code:

  1. The "AUIPC" instruction.
    As mentioned in my issue #8 , the data that will be written in regfile when we are excuting "AUIPC" should be "PC + SIMM" instead of "NXPC + SIMM".
REG[DPTR] <=   RES ? RESET_SP  :        // reset sp
                       HLT ? REG[DPTR] :        // halt
                     !DPTR ? 0 :                // x0 = 0, always!
                     AUIPC ? PC+SIMM :   //original data is NXPC + SIMM
                      JAL||
                      JALR ? NXPC :
                       LUI ? SIMM :
                       LCC ? LDATA :
                       MCC ? MDATA : 
                       RCC ? RDATA : 
                       CCC ? CDATA : 
                             REG[DPTR];
  1. The "STORE (SW, SH, SB)" instruction.
    I think we must use a signal DATA_MASK to describe which byte the "STORE" instruction wants to write into the RAM. Otherwise unexpected data will be written. So I have written my extral code in "darkriscv.v" like this:
output [3:0] DATA_MASK,

                                   //sb
assign DATA_MASK = FCT3 == 0 ? (DADDR[1:0] == 3 ? 4'b1000 :
							DADDR[1:0] == 2 ? 4'b0100 :
							DADDR[1:0] == 1 ? 4'b0010 :
                                                                                        4'b0001
							) : 
				//sh
				   FCT3 == 1 ? (DADDR[1] == 1 ? 4'b1100 :
                                                                                      4'b0011
							) : 
				//sw
				4'b1111;

And in "darksoc.v":

        always @(posedge CLK)
	begin
		if(WR & !DADDR[31])
		begin
			if(DATA_MASK[0])
				RAM[DADDR[12:2]][0 * 8 + 7: 0 * 8] <= DATAO[0 * 8 + 7: 0 * 8];
			if(DATA_MASK[1])
				RAM[DADDR[12:2]][1 * 8 + 7: 1 * 8] <= DATAO[1 * 8 + 7: 1 * 8];
			if(DATA_MASK[2])
				RAM[DADDR[12:2]][2 * 8 + 7: 2 * 8] <= DATAO[2 * 8 + 7: 2 * 8];
			if(DATA_MASK[3])
				RAM[DADDR[12:2]][3 * 8 + 7: 3 * 8] <= DATAO[3 * 8 + 7: 3 * 8];
		end
	end

I have used Xilinx ISE 14.7 for synthesis and downloaded the bitstream into a Xilinx spartan6 LX16 for test. I used ram and rom IP like this:

wire [3:0] ram_wea = {WR, WR, WR, WR} & DATA_MASK;
ram_ip ram_ip_u0 (
  .clka(system_clk), // input clka
  .wea(ram_wea), // input [3 : 0] wea
  .addra(ram_addr), // input [10 : 0] addra
  .dina(DATAO), // input [31 : 0] dina
  .douta(DATAI) // output [31 : 0] douta 
);


rom_ip rom_ip_u0 (
  .clka(system_clk), // input clka
  .addra(rom_addr), // input [9 : 0] addra
  .douta(IDATA) // output [31 : 0] douta
);

Luckily, after I have fixed the 2 bugs above, all tests in https://github.com/riscv/riscv-tests/tree/master/isa/rv32ui were passed, except for "fence_i.S". The clock frequency I used for test was 75MHz.

I also have to point out that I only made the test by setting darkriscv as a 2-stage pipeline cpu without cache, which means that the RAM/ROM and the core use opposite clock edge. Now I am just a green hand in digital IC design so I don't know whether it is pratical for a cpu to operate like that.

I'm sorry that I know little about cache, so I couldn't test it as a 3-stage pipeline cpu with I$/D$ :(.

Best
HYF

Wow! I am really impressed with your investigation about the RV32I compatibility and I think you deserve the credits about this very important achievement! I think your tests and corrections about the STORE and AUIPC make sense, by this way please feel free to send a pull request with your fixes and I will try adapt that to the cache controller and to the 3-stage pipeline in the next week.

About the use of opposite clock edges, although not so popular nowadays, for sure is not a problem for the FPGA. In fact, the problem regards to the use of blockRAM, which provides better storage resources when compared to LUTs or FFs, but requires one extra clock edge in the read operation. As long the darkriscv is supposed to execute all instructions in a single-clock, the requirement of one extra clock edge is a problem in the case of LOAD instruction when working with blockRAMs. However, this problem is pretty solved by making the memory interface work with the opposite clock edge, in a way that any LOAD will work in a single clock from the point of view of the darkriscv core, although in fact it requires two clock edges: one to read the data the blockRAM and another to store the data in a register. Of course, in order avoid extra confusion, the entire darksocv works in the opposite clock edge relative to the darkriscv, which means that the ROM, RAM and peripherals work in the opposite clock.

From the practical point of view everything is supposed to be transparent and is possible reach a deterministic performance of one clock per instruction with 75MHz in the Spartan-6 (50MHz with the cache controller). In theory a logic in the same FPGA with a single edge clock may increase the clock to 150MHz, but will possible require a more complex 4-stage pipeline with interlock between stages or register forward in order to try keep one clock per instruction. The 3-stage pipeline can work with 100MHz, but it does not work with blockRAM and the cache controller decreases the performance to 75MHz.

Well, I will try rework in the cache controller in order to improve the performance and make it more modular and better documented. In fact, the point regarding the cache controller is that not only provides a workaround for the extra clock edge requirement in the case of blockRAM, but also support external memories working with slow bus clocks and smaller bus widths, which requires lots of wait-states, but can be compensated by good caches.

For a while, thank you for your investigation and good hacking! :)

@hyf6661669 : I updated the core with your fix regarding the AUIPC and STORE instructions. As long the data mask is known as "byte select" in other architectures, I renamed that signals to "BE". I also added support for a similar byte selection in the LOAD instruction. Although not so useful for 32-bit memories, the BE signals used with WR and RD will probably be useful in future external memories and peripherals with 16 or 8-bit data paths.

@hyf6661669 : I updated the core with your fix regarding the AUIPC and STORE instructions. As long the data mask is known as "byte select" in other architectures, I renamed that signals to "BE". I also added support for a similar byte selection in the LOAD instruction. Although not so useful for 32-bit memories, the BE signals used with WR and RD will probably be useful in future external memories and peripherals with 16 or 8-bit data paths.

Thanks for your updating. Recently I'm busy with other projects so I'm sorry for the late reply. 👍