comsec-group/cascade-artifacts

About floating-point aliases in Spike

Closed this issue ยท 11 comments

Hi there!

When I use Cascade, logs similar to the following appear when a bug occurs:
Register mismatch (xx) for params: memsize: xxx, design_name: xxx, nmax_bbs: xx, randseed: xx. Expected xxxx, got xxxx.

However, when I trace the cause of the bug (debugging with Spike), I find that the Expected value does not appear in Spike.
How do you trace the cause of bugs when using Cascade? And have you encountered the above problem?

Thanks!

Hi @youzi27,

Thank you for opening an issue!
What you try to do seems correct, yet the expected value should indeed be provided by Spike.
Can you please provide a concrete example?

EDIT: to trace the cause of a CPU bug you can use this reduction script.

Thanks!
Flavien

Thank you very much for your prompt reply!

Specifically, the command run was:
python3 do_fuzzdesign.py cva6-c1 6 0 1

and the log is as follows:

maybe bug
Register mismatch (f9) for params: memsize: 275074, design_name: cva6-c1, nmax_bbs: 10, randseed: 25. Expected 0xffffffff7fc00000, got 0xffffffff00000000.
Failed test_run_rtl_single for params memsize: 275074, design_name: cva6-c1, check_pc_spike_again: True, randseed: 25, nmax_bbs: 10 -- (275074, design_name, 25, 10)

Attachments are:
testcase.zip

Thank you very much!

Hi @youzi27,

Thank you for opening an issue! What you try to do seems correct, yet the expected value should indeed be provided by Spike. Can you please provide a concrete example?

EDIT: to trace the cause of a CPU bug you can use this reduction script.

Thanks! Flavien

Thank you for your kind reminder, the above situation occurred after I used the reduction script.

Great. From a quick view all this looks good.
But you are saying that both spike and CVA6 give the same result, in that case: 0xffffffff00000000?

In the running environment of Spike, it seems that f9 cannot be found to equal any of the aforementioned values.

What is it instead in that case?

Thank you for your reply.
Spike terminates at pc=0x0000000080022d58,
with the final value of f9 being 0xffffffffffffffffffffffff41200000.

Interesting. Maybe there's something to look at here, yet you successfully isolated a single instruction as you may have observed:

Jumped to the context setter
image

Then jumped out of the context setter
image

Then executed a single instruction and then jumped out to the dump
image

This means that the reduction that you did was very successful (congrats ๐ŸŽŠ ). From there you already know the problematic spot.
Now let's look at the value:

Regarding the value, I executed the test case that you sent and got:

core   0: 0x000000008003b338 (0x580324d3) fsqrt.s fs1, ft6
: freg 0 fs1  
0xffffffffffffffffffffffff7fc00000

which seems to comply with the message that you mentioned in your first post.

I hope it helps!
Thanks!

Oh yes now I remember: there is some not-so-intuitive Spike behavior with aliases, which is probably what confused you:

image

Thank you very much for your patient reply!

It seems that I have figured out the problem, thanks to the expert! ๐Ÿ˜„

Great ๐Ÿ‘