[Bug report] fence.i may cause the CPU to hang under some conditions
flaviens opened this issue · 5 comments
Hi there!
I think I found a bug (using the brand new Cascade fuzzer) in EL2.
Looking at the reduced ELFs, it looks like it is due to a fence.i
placed after branches.
In this zip, you'll find reduced test cases and their dumps for convenience.
bug_diff.zip
In the image, left is smallest buggy, right is largest non-buggy.
I'm using the AHB config. There's also a possibility that my parameters are wrong, I'm absolutely not familiar with this core at the moment. Can you please try to reproduce?
Thanks!
any more details?
I was able to simulate the buggy code:
core went to address 0 after fence.i : (that shouldn't happen! Need to see what's wrong in RTL )
272 : s7=93d9de2b ; nbL
275 : #48 0 800000bc 4d5481ef gp=800000c0 ; jal gp,0x80048d90
280 : #49 0 80048d90 b0a35f63 ; bge t1,a0,0x800480ae
286 : #50 0 80048d94 783190e3 ; bne gp,gp,0x80049d14
292 : #51 0 80048d98 00019b0f ; fence.i
302 : #52 0 00000000 00000000 ; .short 0
309 : #53 0 00000000 00000000 ; .short 0
316 : #54 0 00000000 00000000 ; .short 0
since this is invalid instruction at address 0 , CPU takes exception. mtvec is not programed and has reset value of 0 causing CPU to go to address 0 again and thus the core is looping at address 0.
More update:
00019b0f is NOT fence.i, but unimplemented opcode for EL2 - so the CPU takes exception on it.
fence.i opcode is 0000100f.
dasm in TB does not decode all the bits in opcode :
it takes in account only bits 6:0 and 12, assuming that only valid opcodes are in the tests binaries,
I would suggest TB to print that the core took exception/interrupt in the exec.log to make easier to debug such problems ...
Core trace port provides indications what interrupt or exception was taken ....
So actually opcode 0x00019b0f
is fence.i
but with non-zero supplementary arguments. RISC-V spec allows both fence
and fence.i
to have arguments and those are neither decoded nor utilized by VeeR.
The instruction decoder assumes that all those arguments are zero. With the PR #138 I updated it so that it allows them to have non-zero values. With that VeeR defaults to the most generic behavior of the instructions by ignoring the arguments and does not execute any exception. This is sanctioned by the spec.