open-power/snap

FGT Image causes core dump

Closed this issue · 3 comments

I was testing
"SNAP FPGA Release: v1.1.0 Distance: 0 GIT: 0x3c27091f" on FGT using
using
"./snap_example -a2 -A64 -v"
---------- dest Buffer: 0x10009262640
00000000: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 | ................
00000010: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 | ................
00000020: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 | ................
00000030: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 | ................
00000040: c0 00 00 00 3f ff ff ff c8 00 00 00 37 ff ff ff | ............7...

flowed by "core dump"
-A 128 works,
-A 64 overwrites memory followed by core dump

This problem seems to be random after flash update. I was able to recover from this bug by sending (echo 1 > /sys/clas/cxl/card0/reset) to the card.

i do have some more data to look at, I was trying to copy to Buffer + 0x80, but the buffer does
have a gap in between and the data is wrong

---------- dest Buffer: 0x10020700
00000080: 40 00 00 00 bf ff ff ff 48 00 00 00 b7 ff ff ff | ........H.......
00000090: 50 00 00 00 af ff ff ff 58 00 00 00 a7 ff ff ff | P.......X.......
000000a0: 60 00 00 00 9f ff ff ff 68 00 00 00 97 ff ff ff | ........h.......
000000b0: 70 00 00 00 8f ff ff ff 78 00 00 00 87 ff ff ff | p.......x.......
000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
00000140: 00 00 00 00 ff ff ff ff 08 00 00 00 f7 ff ff ff | ................
00000150: 10 00 00 00 ef ff ff ff 18 00 00 00 e7 ff ff ff | ................
00000160: 20 00 00 00 df ff ff ff 28 00 00 00 d7 ff ff ff | ................
00000170: 30 00 00 00 cf ff ff ff 38 00 00 00 c7 ff ff ff | 0.......8.......

The Data i did expect was:
000000c0: 00 00 00 00 ff ff ff ff 08 00 00 00 f7 ff ff ff | ................
000000d0: 10 00 00 00 ef ff ff ff 18 00 00 00 e7 ff ff ff | ................
000000e0: 20 00 00 00 df ff ff ff 28 00 00 00 d7 ff ff ff | ................
000000f0: 30 00 00 00 cf ff ff ff 38 00 00 00 c7 ff ff ff | 0.......8.......
00000100: 40 00 00 00 bf ff ff ff 48 00 00 00 b7 ff ff ff | ........H.......
00000110: 50 00 00 00 af ff ff ff 58 00 00 00 a7 ff ff ff | P.......X.......
00000120: 60 00 00 00 9f ff ff ff 68 00 00 00 97 ff ff ff | ........h.......
00000130: 70 00 00 00 8f ff ff ff 78 00 00 00 87 ff ff ff | p.......x.......
00000140: 80 00 00 00 7f ff ff ff 88 00 00 00 77 ff ff ff | ............w...
00000150: 90 00 00 00 6f ff ff ff 98 00 00 00 67 ff ff ff | ....o.......g...
00000160: a0 00 00 00 5f ff ff ff a8 00 00 00 57 ff ff ff | ............W...
00000170: b0 00 00 00 4f ff ff ff b8 00 00 00 47 ff ff ff | ....O.......G...

Loaded image from https://github.com/open-power/snap/tree/RC_flip_bit_4_fgt on FGT Card.
Tested: 1.) 3 x Card Reset. No Swap / Free Error seen anymore
Tested: 2.) Power cylce: No Swap / Free Error.