Weird behaviour: "list index out of range" CPU bug
asierfdln opened this issue · 12 comments
Hi! While playing around with do_fuzzdesign.py
for the cva6-c1 CPU, I have noticed three program descriptors whose "bug" correspongs to list index out of range
. The descriptors in question are as follows, all for design_name = 'cva6-c1'
in either do_fuzzsingle.py
or do_reducesingle.py
:
- (487340, design_name, 416, 72, False)
- (111180, design_name, 2358, 85, False)
- (850907, design_name, 3083, 24, False)
Apparently, all three errors are related to the verilated cva6-c1 not dumping register values in the output message of its subprocess.run
call, under the runsim_verilator()
function of fuzzsim.py
(Python file under the cascade-meta repository).
Since there are no register dumps in the resulting output message of the Verilator subprocess.run
call in lines 46 and 47 within fuzzsim.py
(therefore, nothing with the format "Dump of reg x{reg_id:02}: 0x"
), an Exception of the type list index out of range
appears later in line 60 when trying to retrieve the integer register values.
It seems like it should be as simple as replacing the itertools.count(current_index)
logic to something simpler like range(curr_index, len(outlines))
, as is the case in line 77, both for the integer registers and the floating-point registers (lines 59 and 66, respectively). But, with this, trying to reduce the faulty programs results in all kinds of AssertionErrors taking place in lines 181-184 of function call runtest_simulator()
, where a certain number of integer-register values and floating-point-register values are expected.
I'm not experienced enough with Verilator or the (faulty) Ariane processor to know why register values are/aren't being dumped, just thought I would point out this weird "CPU bug".
Hi @asierfdln,
Thank you for opening this issue!
It sounds somehow familiar.
My first guess is that this means that the text provided by the Verilator testbench to Cascade is not of the expected format.
Maybe this is an issue with the cva6-c1
testbench.
Could you maybe start by dumping this text? It should be exec_out.stdout
in fuzzsim.py
.
While not impossible, this does not look like a CPU bug (it could be one, if the bug causes the CPU to write to the stop address, but I've never seen something like this, so unlikely a priori :) )
Thank you.
Here is a .zip file with a series of .elf.dump
files and outputs from exec_out.stdout
. All dumps and outputs have been captured by stopping the execution of do_fuzzsingle.py
at the point of exec_out = ...
in fuzzsim.py
, as you mention.
For each descriptor, there are two .elf.dump
files: one .elf.dump
that is generated and executed when calling the profile_get_medeleg_mask()
function, and a second .elf.dump
that is generated and executed when doing the normal fuzz_single_from_descriptor()
, i.e. the "main" .elf
. For each .elf.dump
, in turn, there are two output .txt
files: one before filtering the "Writing ELF word to"
lines (called *_nofilter.txt
), and another after filtering said lines (called *_yesfilter.txt
).
exec_outputs/
├── descriptor_111180_cva6-c1_2358_85_False
│ ├── medelegprofilingcva6-c1.elf.dump
│ ├── medelegprofilingcva6-c1_lastbasicblockregisterdump_nofilter.txt
│ ├── medelegprofilingcva6-c1_lastbasicblockregisterdump_yesfilter.txt
│ ├── rtl111180_cva6-c1_2358_85.elf.dump
│ ├── rtl111180_cva6-c1_2358_85_lastbasicblockregisterdump_nofilter.txt
│ └── rtl111180_cva6-c1_2358_85_lastbasicblockregisterdump_yesfilter.txt
├── descriptor_487340_cva6-c1_416_72_False
│ ├── medelegprofilingcva6-c1.elf.dump
│ ├── medelegprofilingcva6-c1_lastbasicblockregisterdump_nofilter.txt
│ ├── medelegprofilingcva6-c1_lastbasicblockregisterdump_yesfilter.txt
│ ├── rtl487340_cva6-c1_416_72.elf.dump
│ ├── rtl487340_cva6-c1_416_72_lastbasicblockregisterdump_nofilter.txt
│ └── rtl487340_cva6-c1_416_72_lastbasicblockregisterdump_yesfilter.txt
└── descriptor_850907_cva6-c1_3083_24_False
├── medelegprofilingcva6-c1.elf.dump
├── medelegprofilingcva6-c1_lastbasicblockregisterdump_nofilter.txt
├── medelegprofilingcva6-c1_lastbasicblockregisterdump_yesfilter.txt
├── rtl850907_cva6-c1_3083_24.elf.dump
├── rtl850907_cva6-c1_3083_24_lastbasicblockregisterdump_nofilter.txt
└── rtl850907_cva6-c1_3083_24_lastbasicblockregisterdump_yesfilter.txt
As you say, the text provided by the verilated cva6-c1 to Cascade is not in the expected format, notice how the filtered dump .txt
files for the "main" .elf
's (the rtl*.elf.dump
) do not have any "Dump register" messages. I'm guessing that these .elf.dump
files, for whatever reason, make it so that cva6-c1 doesn't write correctly into the regdumpaddr
(0x10) and fpregdump
(0x18) addresses but, for whatever reason, writes correctly into the stopsignaladdr
(0x0)?
Hi @asierfdln thank you for the data! The ELF dumps look ok at first glance. You see that it intends to dump first 0x10
:
8002dad4: 00000f37 lui t5,0x0
8002dad8: 010f0f13 add t5,t5,16 # 0x10
8002dadc: 0ff0000f fence
8002dae0: 001f3023 sd ra,0(t5)
8002dae4: 0ff0000f fence
8002dae8: 002f3023 sd sp,0(t5)
8002daec: 0ff0000f fence
8002daf0: 003f3023 sd gp,0(t5)
8002daf4: 0ff0000f fence
8002daf8: 004f3023 sd tp,0(t5)
8002dafc: 0ff0000f fence
8002db00: 005f3023 sd t0,0(t5)
8002db04: 0ff0000f fence
8002db08: 006f3023 sd t1,0(t5)
8002db0c: 0ff0000f fence
8002db10: 007f3023 sd t2,0(t5)
8002db14: 0ff0000f fence
8002db18: 008f3023 sd s0,0(t5)
8002db1c: 0ff0000f fence
8002db20: 009f3023 sd s1,0(t5)
8002db24: 0ff0000f fence
8002db28: 00af3023 sd a0,0(t5)
8002db2c: 0ff0000f fence
8002db30: 00bf3023 sd a1,0(t5)
8002db34: 0ff0000f fence
8002db38: 00cf3023 sd a2,0(t5)
8002db3c: 0ff0000f fence
8002db40: 00df3023 sd a3,0(t5)
8002db44: 0ff0000f fence
8002db48: 00ef3023 sd a4,0(t5)
8002db4c: 0ff0000f fence
8002db50: 00ff3023 sd a5,0(t5)
8002db54: 0ff0000f fence
8002db58: 010f3023 sd a6,0(t5)
8002db5c: 0ff0000f fence
8002db60: 011f3023 sd a7,0(t5)
8002db64: 0ff0000f fence
8002db68: 012f3023 sd s2,0(t5)
8002db6c: 0ff0000f fence
8002db70: 013f3023 sd s3,0(t5)
8002db74: 0ff0000f fence
8002db78: 014f3023 sd s4,0(t5)
8002db7c: 0ff0000f fence
8002db80: 015f3023 sd s5,0(t5)
8002db84: 0ff0000f fence
8002db88: 016f3023 sd s6,0(t5)
8002db8c: 0ff0000f fence
8002db90: 017f3023 sd s7,0(t5)
8002db94: 0ff0000f fence
8002db98: 018f3023 sd s8,0(t5)
8002db9c: 0ff0000f fence
8002dba0: 00000f37 lui t5,0x0
8002dba4: 000f0f13 mv t5,t5
8002dba8: 000f3023 sd zero,0(t5) # 0x0
8002dbac: 0ff0000f fence
8002dbb0: 0000006f j 0x8002dbb0
Could you please run cva6-c1 with traces and see what happens on the memory side (i.e., whether the signals reach the top output)?
Just to check, by "with traces" you mean that I should recompile cva6-c1 with a make run_vanilla_trace
instead of with the default make run_vanilla_notrace
set by default in the make_all_designs.py
? Or is there a flag for the verilator executable Variane_tiny_soc
I'm missing somewhere?
make run_vanilla_trace
Exactly 👍 . You may be missing the .core file for that. The easiest is to duplicate the *_notrace.core, change its name (also on the top of the file contents) and add the trace lines to the Verilator options like here .
Apparently SIMLEN is not defined when doing trying to run make rules:
root@208de54b6df4:/cascade-cva6-c1/cascade# make run_vanilla_trace
rm -f fusesoc.conf
fusesoc library add run_vanilla_trace .
INFO: Interpreting sync-uri '.' as location for local provider.
fusesoc run --build run_vanilla_trace
INFO: Preparing ::run_vanilla_trace:0.1
INFO: Setting up project
INFO: Building simulation model
cp generated/out/vanilla.sv.log build/run_vanilla_trace_0.1/default-verilator/Variane_tiny_soc.log
mkdir -p traces
cd build/run_vanilla_trace_0.1/default-verilator && ./Variane_tiny_soc
Starting getting env variables.
SIMLEN environment variable not set.
make: *** [Makefile:173: run_vanilla_trace] Error 1
root@208de54b6df4:/cascade-cva6-c1/cascade# make run_vanilla_notrace
cd build/run_vanilla_notrace_0.1/default-verilator && ./Variane_tiny_soc
Starting getting env variables.
SIMLEN environment variable not set.
make: *** [Makefile:173: run_vanilla_notrace] Error 1
Can I safely declare some value of SIMLEN within /cascade-meta/env.sh
or am I missing something? Each CPU apparently has some SIMLEN definitions within some other files (which aren't being touched, by the looks of the above messages):
root@135ecb749226:/# grep -r "export SIMLEN=" cascade-*
cascade-cva6/cascade/tests.sh:export SIMLEN=10000
cascade-cva6/cascade/env.sh:export SIMLEN=100000
cascade-cva6-c1/cascade/tests.sh:export SIMLEN=10000
cascade-cva6-c1/cascade/env.sh:export SIMLEN=100000
cascade-cva6-y1/cascade/tests.sh:export SIMLEN=10000
cascade-cva6-y1/cascade/env.sh:export SIMLEN=100000
cascade-kronos/cascade/env.sh:export SIMLEN=10000
cascade-kronos-k1/cascade/env.sh:export SIMLEN=10000
cascade-kronos-k2/cascade/env.sh:export SIMLEN=10000
cascade-picorv32/cascade/env.sh:export SIMLEN=200000
cascade-picorv32-p5/cascade/env.sh:export SIMLEN=200000
Yeah the lower-level parts are a bit less documented sorry for that ^^'. You should set SIMSRAMELF to the path of your elf, SIMLEN to the number of cycles you'd like to run at most (you can safely put it in excess if you know it will stop) and TRACEFILE to the path of the vcd that you want to obtain.
Managed to get the .vcd file, any clues about which signals to look out for? (Currently going through it in Questasim)
Edit: attached the .vcd and .wlf files
Uploading waveforms_487340_cva6-c1_416_72_False.zip…
Nice. I'd recommend to first look at the signals that reach the toplevel memory. You may also want to try generating super short programs, so the VCDs will be shorter ;)
Any updates @asierfdln ? :)
Hey! I have been busy trying to get my own CPU design working, hopefully I can take a look at this (and the other two issues) during the next two weeks or so :)