OpenNuvoton/NUC980-linux-4.4.y

NUC980 Nand ECC correction is not working

Closed this issue · 3 comments

We recently encounter strange problem.
[17588.040000] UBIFS error (ubi0:0 pid 15691): do_readpage: cannot
read page 15 of inode 396, error -22
[17590.080000] UBIFS error (ubi0:0 pid 15695): ubifs_decompress:
cannot decompress 2293 bytes, compressor lzo, error -22
[17590.090000] UBIFS error (ubi0:0 pid 15695): do_readpage: bad data
node (block 15, inode 396)
lots of hex dump, seems normal data

when this happen, the system become
very slow. and random's freeze, but not down.

after consult the mtd mailing list, Miquel Raynal miquel.raynal@bootlin.com suggested we do mtdtest to confirm nand implementation works correctly.
So I run nandbiterrs on nuc980, unfortunately.

[root@jgcx: ~]#./nandbiterrs /dev/mtd2 -i
incremental biterrors test
Successfully corrected 0 bit errors per subpage
Inserted biterror @ 0/5
Failed to recover 1 bitflips
Read error after 1 bit errors per page

Looks NUC980 Nand driver can't do ECC correction.

When we do this test on another NXP cpu, the result is :
root@303X:/root# ./nandbiterrs /dev/mtd2 -i
incremental biterrors test
Successfully corrected 0 bit errors per subpage
Inserted biterror @ 0/5
Read reported 1 corrected bit errors
Successfully corrected 1 bit errors per subpage
Inserted biterror @ 0/2
Read reported 2 corrected bit errors
Successfully corrected 2 bit errors per subpage
Inserted biterror @ 0/0
Read reported 3 corrected bit errors
Successfully corrected 3 bit errors per subpage
Inserted biterror @ 1/7
Read reported 4 corrected bit errors
Successfully corrected 4 bit errors per subpage
Inserted biterror @ 1/5
Failed to recover 1 bitflips
Read error after 5 bit errors per page

Looks NXP's cpu can do up to 4 bits error autocorrect.

We think if we fix the ecc correction, we will resolve the random ubifs_decompress fail error. Any help?

Thanks.

./nandbiterrs /dev/mtd2 -i

incremental bite[ 264.170000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
rrors test
Successfully corrected 0 bit err[ 264.180000] SM uncorrectable error is encountered, 0x 22 !!
ors per subpage
Inserted biterror @ 0/5
Failed to recover 1 bitflips
Read error after 1 bit errors per page

[root@303mini /root]

dmesg

[ 246.730000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.760000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.870000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.880000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.890000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.900000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.910000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.910000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.920000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.930000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 246.940000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 261.440000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 261.450000] SM uncorrectable error is encountered, 0x 22 !!
[ 264.170000] nuc980_nand_write_page_hwecc c3ac6880 60 total:oobsize:64 60
[ 264.180000] SM uncorrectable error is encountered, 0x 22 !!

Looks by default implementation in nuc980_nand.c, the single one bit error can't be corrected.

mtd2 use raw write to do the ECC correction test, but nuc980 nand driver does not support raw page write.

Try this ==>
nuc980_nand.zip

Thanks. You saved my life. I have been debugging this issue since morning.
I checked your code, the dma write is removed and I tried, it works now. the ecc corrected works fine.
I am deploying to my device to see if the "ubifs_decompress:
cannot decompress 2293 bytes, compressor lzo, error -22" would disappear.

Thanks. :)