About floating-point aliases in Spike
Closed this issue ยท 11 comments
Hi there!
When I use Cascade, logs similar to the following appear when a bug occurs:
Register mismatch (xx) for params: memsize: xxx, design_name: xxx, nmax_bbs: xx, randseed: xx. Expected xxxx, got xxxx.
However, when I trace the cause of the bug (debugging with Spike), I find that the Expected value does not appear in Spike.
How do you trace the cause of bugs when using Cascade? And have you encountered the above problem?
Thanks!
Hi @youzi27,
Thank you for opening an issue!
What you try to do seems correct, yet the expected value should indeed be provided by Spike.
Can you please provide a concrete example?
EDIT: to trace the cause of a CPU bug you can use this reduction script.
Thanks!
Flavien
Thank you very much for your prompt reply!
Specifically, the command run was:
python3 do_fuzzdesign.py cva6-c1 6 0 1
and the log is as follows:
maybe bug
Register mismatch (f9) for params: memsize: 275074, design_name: cva6-c1, nmax_bbs: 10, randseed: 25. Expected 0xffffffff7fc00000, got 0xffffffff00000000.
Failed test_run_rtl_single for params memsize: 275074, design_name: cva6-c1, check_pc_spike_again: True, randseed: 25, nmax_bbs: 10 -- (275074, design_name, 25, 10)
Attachments are:
testcase.zip
Thank you very much!
Hi @youzi27,
Thank you for opening an issue! What you try to do seems correct, yet the expected value should indeed be provided by Spike. Can you please provide a concrete example?
EDIT: to trace the cause of a CPU bug you can use this reduction script.
Thanks! Flavien
Thank you for your kind reminder, the above situation occurred after I used the reduction script.
Great. From a quick view all this looks good.
But you are saying that both spike and CVA6 give the same result, in that case: 0xffffffff00000000
?
In the running environment of Spike
, it seems that f9
cannot be found to equal any of the aforementioned values.
What is it instead in that case?
Thank you for your reply.
Spike terminates at pc=0x0000000080022d58
,
with the final value of f9 being 0xffffffffffffffffffffffff41200000
.
Interesting. Maybe there's something to look at here, yet you successfully isolated a single instruction as you may have observed:
Then jumped out of the context setter
Then executed a single instruction and then jumped out to the dump
This means that the reduction that you did was very successful (congrats ๐ ). From there you already know the problematic spot.
Now let's look at the value:
Regarding the value, I executed the test case that you sent and got:
core 0: 0x000000008003b338 (0x580324d3) fsqrt.s fs1, ft6
: freg 0 fs1
0xffffffffffffffffffffffff7fc00000
which seems to comply with the message that you mentioned in your first post.
I hope it helps!
Thanks!
Thank you very much for your patient reply!
It seems that I have figured out the problem, thanks to the expert! ๐
Great ๐