Note: Forked from https://github.com/ak-hard/brcm-nand-bch, and very rough code added alongside to reencode/decode ECC for T=4, N=14. I've no idea which controllers this will work for, YMMV.
This piece of software emulates generation of BCH correction codes by the Broadcom NAND controller.
Broadcom series of SOCs, such as brcm63x68 and brcm63x69, implement an integrated NAND flash controller. Although little documentation is openly available, an open-source driver exists in Linux (brcmnand). The driver does not have software ECC generation or decoding, but some parameters can be looked up in the code. In particular, the following comment indicates that BCH of order 13 and 14 can be used, depending on the controller version:
/*
* CONTROLLER_VERSION:
* < v5.0: ECC_REQ = ceil(BCH_T * 13/8)
* >= v5.0: ECC_REQ = ceil(BCH_T * 14/8)
*/
In particular, we are looking at a particular NAND configuration of 2KB page size and 64 bits of out-of-band (OOB) data per page. The minimal ECC block (sector) is 512 bytes with 16 bytes of OOB region. The OOB data looks as the following:
xxx0000: ff 19 85 20 03 00 00 00 08 03 d1 0d 32 4f ee df ... ........2O..
xxx0010: ff ff ff ff ff ff ff ff ff 0a 52 3e b2 65 40 d8 ..........R>.e@.
Here, the first line corresponds to a sector at the beginning of an erase block and the other line is for all other sectors. For a particular type of sector, the first 9 1/2 bytes are always the same, and the last 52 bits are random. A 52-bit BCH ECC can be generated by a degree-13 BCH algorithm with T=4 (13 x 4 = 52).
Although this information is valuable, it does not necessary allows us to generate the correct code because there are several ways the algorithm can be used, such as how many bits are included in ECC, their alignment, and also bit order and possibly additive constants applied to the result (e.g. XOR-ing). Another required piece of information, the initial BCH polynomial, is also unknown.
The remaining parameters are obtained by a linear analysis of a sample of available ECC generated by a read SOC, since BCH codes are linear codes. The polynomial had to be brute forced but since not many numbers are suitable this was not hard.
First, it was determined that the 512-bytes sectors themselves are not linearly correlated to the 52 hardware generated ECC bits with any of the possible polynomials. However, such correlation was found to exists when only non-first sectors on the erase blocks are considered. Since all such sectors have the same value of non-ECC OOB bits, this implies that some of these bits are also included in ECC computation. Luckly, this also implies that the default polynomial (0x201b) is used by hardware. Some experimentation revealed that actually all 9 1/2 bytes need to be considered to establish a linear dependency to hardware ECC, valid for all sectors. Due to linearity of BCH codes, such dependency must also exist between the hardware and software generated BCH codes, even if they do not fully match.
Simply concatenation of data and first 9 1/2 OOB bytes produces the following linear dependency matrix between software and hardware BCH generators:
0 10101000000000000000000000000000000000000000000000000
1 01010100000000000000000000000000000000000000000000000
2 10000010000000000000000000000000000000000000000000000
3 01000001000000000000000000000000000000000000000000000
4 00100000100000000000000000000000000000000000000000000
...
45 00100000000000000000000000000000000000000000000001000
46 00010000000000000000000000000000000000000000000000100
47 00100000000000000000000000000000000000000000000000010
48 10010000000000000000000000000000000000000000000000000
49 11100000000000000000000000000000000000000000000000000
50 11110000000000000000000000000000000000000000000000000
51 01010000000000000000000000000000000000000000000000000
There is a nice diagonal that is "late" by four bits and some garbage in the first four bits. Masking out the ECC bits and including 10 OOB bytes in the computation produces a matrix that is now 4 bits "too early":
0 00000000000000000000000000000000000000000000000000110
1 00000000000000000000000000000000000000000000000000010
2 00000000000000000000000000000000000000000000000011100
3 00000000000000000000000000000000000000000000000001110
4 10000000000000000000000000000000000000000000000011010
5 01000000000000000000000000000000000000000000000001100
6 00100000000000000000000000000000000000000000000000110
7 00010000000000000000000000000000000000000000000000100
...
47 00000000000000000000000000000000000000000001000000110
48 00000000000000000000000000000000000000000000100000100
49 00000000000000000000000000000000000000000000010000010
50 00000000000000000000000000000000000000000000001011010
51 00000000000000000000000000000000000000000000000110110
This suggests that ECC generation should take into account exactly 512 + 9 1/2 bytes. To do this, the whole string is shifted by 4 bits to the right. After this, the correlation matrix becomes diagonal and the hardware and software ECC match exactly.