darklife/darkriscv

I am curious about weird bug appears to be related to the "sw ra,12(sp)" instruction.

thomascp opened this issue · 3 comments

Hi,

I am curious about 'weird bug appears to be related to the "sw ra,12(sp)" instruction.'.
What happens if "sw ra,12(sp)" executes?
And I also noticed that this part had been removed in a recent commit?
Could you tell me what is the problem and how you solve it?
Thanks!

BR,
Peng

Hi Peng!

This is really a very, very good question! :)

In early days of the darkriscv I tested some weird memory configurations where the instruction (ROM) and data (RAM) memories are unified in a single shared memory. Sometimes the strings in the .rodata area gets corrupted without explanation and I started an investigation about it via the ROMBUG register, which is stores the NXPC at the moment that the WR is activated for the ROM memory.

The "version 16" dated from Feb 6, 2019 appears to reproduce the problem more easily and, sporadically, is possible run the "bug" command and get the value 8 for the NXPC, which means that the problem originated in the execution of instruction 4 (the "sw ra,12(sp)" instruction).

Although the more recent versions does not allow write in the instruction area and I never saw the problem again, I never discovered the cause of the problem and how it fixed itself... Until now! I double-checked the problem in that specific version and I found something weird in the simulation regarding the RESET signal in the initial clock pulses.

The RESET tree in this specific version from Feb 6, 2019 is designed as:

input XRES // external RESET
...
reg [7:0] IRES = -1; // internal reset
always@(posedge XCLK) IRES <= XRES ? -1 : IRES[7] ? IRES-1 : 0;
wire RES = IRES[7]; // internal reset delayed by 128 clocks
...
reg [1:0] RESFF; // ultra-deep internal reset
always@(posedge CLK) RESFF <= RESFF<<1 | RES;
...
darkriscv
#(
    .RESET_PC(0),
    .RESET_SP(32'h00002000)
) 
core0 
(
    .CLK(CLK),
    .RES(RESFF[1]), // finally, the processor reset! \o/
...

In the simulation the core0/RES appears as undefined in the first clock, as well lots of other non-initialized register. However, as long anything undefined is, in reality, initialized as zero in the FPGA, it is possible initialize all undefined register as zero and change the simulation in order that the CLK start as 0, in a way that is possible simulate a more real behaviour regarding the FPGA in the post-programming stage.

The result is that the darkriscv can execute two instructions before the RESFF[1] reach the value 1 and halt the execution! Without surprise that instructions are the fist instructions in the boot.c code:

       addi    sp,sp,-16
        sw      ra,12(sp)

And here we solved the mystery about the ghost "sw ra,12(sp)" instruction. I just initialized the RESFF with -1 and solved a problem that is present from the first version of darkriscv!

Ironically, in the same night at Feb 6, 2019 I made a new commit "version 17" with some fixes regarding the RESET tree and fully removed the RESFF logic:

input        XRES,       // external reset
...
reg [7:0] IRES = -1; // internal reset
always@(posedge XCLK) IRES <= XRES ? -1 : IRES[7] ? IRES-1 : 0;
wire RES = IRES[7];
...
darkriscv
#(
    .RESET_PC(0),
    .RESET_SP(32'h00002000)
) 
core0 
(
    .CLK(CLK),
    .RES(RES), // bingo!
...

The funny side is that I fixed the problem by mistake and lost the next six mounts typing "bug" in the darkriscv console in order to find a problem that does not exist anymore! Anyway, thank you by pointing the problem and good hacking! o/

BR,
Marcelo

Hi Marcelo,

Thanks for your detailed explanation! This is very helpful.
Actually, I am new in FPGA field, darkriscv is a very good start to learn how to build a CPU by using verilog. I see your plan to support more features, that is cool! Hope things will go smoothly, will follow and study. Thanks!

BR,
Peng

Thank you Peng!