Verilog BCH encoder/decoder. This is a Verilog based BCH encoder and decoder for single bit, dual bit, and 3 or more bit error correction. The equations and layout for the encoder and decoders is taken from "The design of a vhdl based synthesis tool for bch codecs." by Ernest Jamro. The decoder/encoder is completely parameterizable. The two main parameter are: DATA_BITS - The number of data bits T - The number of bits that can be corrected The module bch_param generates a BCH parameter set: `include "bch_params.vh" localparam [`BCH_PARAM_SZ-1:0] P = bch_params(DATA_BITS, T); The generated parameters are passed to the BCH modules and can be access with the following: `BCH_M(P) - Degree of the field `BCH_N(P) - Actual code size (padded) `BCH_K(P) - Actual data size (padded) `BCH_T(P) - Actual bits corrected (may be more than specified) `BCH_DATA_BITS(P) - Data bits `BCH_ECC_BITS(P) - Generate ECC bits `BCH_CODE_BITS(P) - Data bits + ECC bits `BCH_SYNDROMES_SZ(P) - Bits required for syndromes passed between modules `BCH_SIGMA_SZ(P) - Bits required for sigma equation passed between modules `BCH_ERR_SZ(P) - Bits required for number of errors `BCH_PARAM_SZ - Bits required to hold parameters Note that the number of errors correctable for a given polynomial is sparse. The search function will choose the next highest number of correctable errors rather than trying to move to the next polynomial. For instance, if DATA_BITS=64, T=8 is seleceted, (127, 71, 9) is chosen rather than (255, 191, 8). Many modules accept a PIPELINE_STAGES and/or a REG_RATIO parameter. These parameters relate to area/speed/latency optimizations. There is not a hard and fast rule of which setting will give the best speed or area result. PIPELINE_STAGES adds additional pipelining stages to operation, increasing register count and latency, but possibly increasing speed. For modules that accept input in parallel and have duplicated register sets for dealing with the parallel input, REG_RATIO reduces the number of registers used. REG_RATIO=1 will include all registers, REG_RATIO=2 will calculate every other register asyncronously from the previous register, REG_RATIO=3 will calculate every third register asyncrounously and so on. The maximum allowable REG_RATIO is REG_RATIO=BITS (the input width). Note that when compiling with Xilinx tools, the '-loop_iteration_limit' option to XST is required to be set above the `BCH_N constant. Choosing a value that is DATA_BITS is usually safe. The code currently compiles under Icarus Verilog, Xilinx Isim and Xilinx XST. Bit Order: When operating with multi-bit input/output, the first bit in each sequence is the high bit of the first word. If padding is added to the last word is a sequence, it is added to the lowest bits of the word. Modules: The encoder consists of a single module, but the decoder consists of several modules that are meant to be used together. Depending on requirements, certain modules can be shared between decoders. Slice counts and max frequencies are show below each module based on the Xilinx Virtex-6 LX240T-3, built with all default options except for '-opt_level 2' passed to XST. Additional optimizations may yield lower area/higher speeds. bch_encode - Performs encoding of input data via a large LFSR. Takes a BITS wide input. For the first ceil(`BCH_DATA_BITS / BITS) cycles (including cycle 0) the output mirrors the input. Then, for ceil(`BCH_ECC_BITS / BITS) cycles the output contains the ECC bits. In the case that `BCH_DATA_BITS % BITS is non-zero, the low bits of the last data word will be ignored. A pipeline stage can be added by setting the PIPELINE_STAGES=1 parameter. T=1,DATA_BITS=64: BITS= 1,PIPELINE_STAGES=0 12 slices 503 MHz BITS= 8,PIPELINE_STAGES=1 18 slices 543 MHz BITS=16,PIPELINE_STAGES=0 18 slices 543 MHz T=1,DATA_BITS=256: BITS=16,PIPELINE_STAGES=0 22 slices 543 MHz T=3,DATA_BITS=256: BITS=16,PIPELINE_STAGES=0 72 slices 390 MHz T=8,DATA_BITS=256: BITS=16,PIPELINE_STAGES=1 125 slices 447 MHz T=12,DATA_BITS=256: BITS=16,PIPELINE_STAGES=1 183 slices 387 MHz T=12,DATA_BITS=4096: BITS= 8,PIPELINE_STAGES=0 150 slices 375 MHz BITS=16,PIPELINE_STAGES=0 295 slices 318 MHz bch_blank_ecc - Allows the ECC on erased flash to be valid. It does this by precomputing the ECC of an all 1's input and then inverting it. This module provides this data so it can be XOR'd with the ECC bit output from the encode module and XOR'd again before input to the bch_syndrome module. bch_syndrome - Calculates the syndrome equations. Takes a BITS wide input. Operates for ceil(`BCH_CODE_BITS / BITS) cycles to produce the compact syndrome equations. Note that bch_syndrome expects the data and ecc input to be merged together without pad bits in the case of `BCH_DATA_BITS % BITS being non-zero, which is not what bch_encode produces. It is recommended to make BITS divisible by `BCH_DATA_BITS. In the event that `BCH_CODE_BITS % BITS is non-zero, the upper bits of the final word are ignored. Up to two pipeline stages can be added by setting the PIPELINE_STAGES parameter. T=1,DATA_BITS=64: BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1 8 slices 464 MHz BITS= 4,PIPELINE_STAGES=1,REG_RATIO=4 13 slices 506 MHz BITS= 8,PIPELINE_STAGES=2,REG_RATIO=1 30 slices 485 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 48 slices 408 MHz T=1,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 52 slices 386 MHz T=3,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 66 slices 451 MHz T=8,DATA_BITS=256: BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 43 slices 508 MHz T=12,DATA_BITS=256: BITS= 4,PIPELINE_STAGES=2,REG_RATIO=4 21 slices 488 MHz BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 22 slices 512 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 42 slices 485 MHz T=12,DATA_BITS=4096: BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8 25 slices 512 MHz BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 42 slices 453 MHz bch_errors_present - Determines based on syndromes if any errors are present (Not required for decoding). This module can be used to allow several bch_syndrome modules to share bch_sigma_* and bch_error_* modules or to determine that the error is data free before allowing bch_sigma_* to complete. Up to two pipeline stages can be added by setting the PIPELINE_STAGES parameter. bch_sigma_* - Key equation solvers (sigma). Takes syndrome equations as input and produces the key equation, as well as the number of bit errors detected. Solving the key equation is required for T > 2 and optional for T == 2, and not supported for T == 1. bch_sigma_bma_serial - Serial Berlekamp–Massey algorithm with inversion. This option takes longer but requires less space than the parallel option. This option takes `BCH_T(P) * (`BCH_M(P) + 2) - 2 cycles to solve the key equation. Because of variability in the basis conversion circuits, the performance can vary widely for different values of `BCH_M. T= 2,DATA_BITS=64 63 slices 376 MHz T= 2,DATA_BITS=256 80 slices 345 MHz T= 2,DATA_BITS=1024 95 slices 263 MHz T= 2,DATA_BITS=4096 120 slices 276 MHz T= 8,DATA_BITS=256 192 slices 217 MHz T= 8,DATA_BITS=1024 225 slices 262 MHz T= 8,DATA_BITS=4096 272 slices 190 MHz T=12,DATA_BITS=256 266 slices 208 MHz T=12,DATA_BITS=1024 299 slices 205 MHz T=12,DATA_BITS=4096 429 slices 194 MHz bch_sigma_bma_parallel - Parallel Berlekamp–Massey algorithm without inversion. This option operates in less cycles than the serial option, but requires more gates. This option takes `BCH_T(P) * 2 - 1 cycles to solve the key equation. T= 2,DATA_BITS=64 92 slices 330 MHz T= 2,DATA_BITS=256 138 slices 327 MHz T= 2,DATA_BITS=1024 222 slices 284 MHz T= 2,DATA_BITS=4096 335 slices 236 MHz T= 8,DATA_BITS=256 444 slices 223 MHz T= 8,DATA_BITS=1024 586 slices 251 MHz T= 8,DATA_BITS=4096 754 slices 212 MHz T=12,DATA_BITS=256 576 slices 230 MHz T=12,DATA_BITS=1024 761 slices 229 MHz T=12,DATA_BITS=4096 1025 slices 191 MHz bch_sigma_bma_noinv - Serial Berlekamp–Massey algorithm without inversion. This option takes `BCH_T(P) * (`BCH_M(P) * 2 + 1) cycles to solve the key equation. T= 3,DATA_BITS=256 19 slices 421 MHz T= 3,DATA_BITS=1024 107 slices 348 MHz T= 3,DATA_BITS=4096 128 slices 320 MHz T= 8,DATA_BITS=256 19 slices 307 MHz T= 8,DATA_BITS=1024 231 slices 261 MHz T= 8,DATA_BITS=4096 305 slices 232 MHz T=12,DATA_BITS=256 23 slices 489 MHz T=12,DATA_BITS=1024 342 slices 196 MHz T=12,DATA_BITS=4096 445 slices 203 MHz bch_error_* - Error locator. After 2 cycles, it clocks out a BITS wide error bits word ceil(`BCH_DATA_BITS / BITS) cycles, each cycle representing a set of error locations. If `BCH_DATA_BITS % BITS is non-zero, the low bits of the last word in the output are masked. bch_error_dec - Error location function for T < 3. Takes syndromes directly as input rather than a solved key equation. Same output as bch_error_* above, but also produces the error_count. Up to two pipeline stages for T==2 or one for T==1 can be added by setting the PIPELINE_STAGES parameter. T=1,DATA_BITS=64: BITS= 1,PIPELINE_STAGES=1,REG_RATIO=1 17 slices 491 MHz BITS= 8,PIPELINE_STAGES=1,REG_RATIO=1 45 slices 510 MHz BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8 23 slices 490 MHz BITS=16,PIPELINE_STAGES=0,REG_RATIO=8 32 slices 482 MHz T=1,DATA_BITS=256: BITS=16,PIPELINE_STAGES=0,REG_RATIO=16 31 slices 466 MHz T=1,DATA_BITS=4096: BITS=16,PIPELINE_STAGES=0,REG_RATIO=4 88 slices 423 MHz BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 64 slices 385 MHz T=2,DATA_BITS=64: BITS= 1,PIPELINE_STAGES=2,REG_RATIO=1 36 slices 442 MHz BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 105 slices 504 MHz BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4 84 slices 488 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=1 183 slices 424 MHz T=2,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 367 slices 291 MHz T=2,DATA_BITS=4096: BITS=16,PIPELINE_STAGES=1,REG_RATIO=4 821 slices 230 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 637 slices 215 MHz bch_error_tmec - Error location function for T > 1. Requires the solved key equation as input. Same output as bch_error_* above. Up to two pipeline stages can be added by setting the PIPELINE_STAGES parameter. T=3,DATA_BITS=64: BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1 26 slices 485 MHz BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4 75 slices 457 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 111 slices 393 MHz T=3,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 152 slices 416 MHz T=3,DATA_BITS=4096: BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 314 slices 352 MHz T=8,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 373 slices 337 MHz T=8,DATA_BITS=4096: BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 737 slices 251 MHz T=12,DATA_BITS=256: BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 595 slices 297 MHz T=12,DATA_BITS=4096: BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 585 slices 262 MHz BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 1247 slices 183 MHz sim - Test bench core and example code for connecting together the different modules. Takes an additional parameter to specify the type of key equation solver: OPTION - The type of key equation solver: OPTION == NONE - For T < 3, skip the key solving process OPTION == PARALLEL - For T > 1, use bch_sigma_bma_parallel OPTION == SERIAL - For T > 1, use bch_sigma_bma_serial OPTION == NOINV - For T > 1, use bch_sigma_bma_noinv BITS - The word width for bch_encode, bch_syndrome, and bch_error_*. REG_RATIO - When using multi-bit datastreams, bch_syndrome and bch_error_* create a duplicated register set for each bit. This causes them to instead only create a register for every REG_RATIOth bit. The additional values are calculated asyncronously which may create timing issues, but will reduce register usage. TODO: Improve simulation times. Shared syndrome/chien stages between data streams? Forney algorithm for error location? BTZ algorithm for error location? Encoder Example Output: Note: This includes XOR'ing the ECC data with bch_blank_ecc. The output matches the ECC generated by Linux software BCH. DATA_BITS=4096, T=3, BITS=8, GP(2^13) 0x000: 24 ac a8 48 81 7a 91 03 90 6b 21 21 c6 6a 0f 8d 0x010: b0 de 9d 60 b1 99 fb 62 a6 21 43 4d 48 4a ec 91 0x020: 80 c7 cf 00 b0 b3 eb 60 6e 8f c4 dc bc b3 b1 78 0x030: b7 ea 2b 6e 31 ef b4 62 9f 97 bf 3e 63 24 ec c7 0x040: a3 21 ef 47 55 4b c8 ab a7 ef f5 4e ee 4e 4b dd 0x050: 48 6b 4c 91 f1 db 37 e2 4f 96 0c 9e 73 69 6e e7 0x060: 17 5e 00 2f a3 72 53 47 3a 92 1a 74 bd 14 71 7b 0x070: b4 1a d5 69 a6 64 e5 4d c6 46 f7 8d 50 e1 ce a0 0x080: 01 44 dc 03 04 84 26 08 55 a2 1e aa b9 33 33 73 0x090: 69 7c 04 d3 c8 1d c1 91 b0 1c 3d 61 ca 6b e5 95 0x0a0: d6 bb f9 ac ab 54 65 57 40 52 e8 81 75 7c 6a eb 0x0b0: b8 97 ff 70 f3 4e a3 e7 c4 92 21 88 50 a6 90 a0 0x0c0: 53 50 ce a7 3c 32 8c 79 4f bd 16 9e 51 86 74 a2 0x0d0: 82 cd 23 04 1b 2c d2 37 1e cf 04 3c 91 48 51 23 0x0e0: d7 b0 a3 ae 6d a6 40 da 63 9f d2 c6 75 c8 a8 ea 0x0f0: 3d 40 ae 7b 54 bb f0 a8 d0 59 eb a1 8e 7d e9 1d 0x100: a1 82 4f 42 f2 aa 4f e4 dc 06 33 b9 5c cc 60 b8 0x110: 7e 4c c4 fd a8 d2 55 50 8f d9 9b 1e 9b d2 39 36 0x120: 46 e0 c4 8c 32 df ac 64 f9 aa 85 f2 f1 85 9d e2 0x130: 1f 95 b6 3e ed 06 b3 db f9 6a 23 f3 3a 29 be 75 0x140: da 03 d3 b5 91 ec 7b 22 92 4c 75 25 0b 7c c4 17 0x150: eb 03 7f d7 8a 3f e3 15 69 f4 84 d2 30 fe b8 60 0x160: 43 73 34 87 13 b7 ea 26 ee 5a 41 dd bc b7 4f 78 0x170: 48 db 30 90 7e 00 a0 fd b6 81 a7 6c 9c e7 9b 38 0x180: f8 f1 4b f0 cb 6f d7 97 a1 f4 9b 42 1e 11 28 3d 0x190: da 9a 5b b4 92 d2 d1 24 fc 75 4f f9 54 f5 30 a8 0x1a0: 1a c6 a8 34 61 1c fc c3 71 30 58 e3 39 4f f4 73 0x1b0: 5f 99 ac be 64 3d be c9 ce e8 69 9c c5 2d d1 8b 0x1c0: f4 15 99 e9 cd 49 4b 9b fa 6a 1b f5 22 b0 94 44 0x1d0: ef bf 1f de d3 6c d9 a7 67 58 90 cf 5a db 14 b4 0x1e0: 94 83 f3 28 b7 47 41 6f 5b 36 ca b7 a6 6d f9 4d 0x1f0: ad 47 51 5b fb 29 7d f7 9e 98 8f 3c 22 ff 8c 44 ECC: 24 a2 6b 4d 5b -- Russ Dill <russ.dill@asu.edu>