/bch_verilog

Verilog based BCH encoder/decoder

Primary LanguageVerilogOtherNOASSERTION

Verilog BCH encoder/decoder.

This is a Verilog based BCH encoder and decoder for single bit, dual bit, and
3 or more bit error correction. The equations and layout for the encoder
and decoders is taken from "The design of a vhdl based synthesis tool for bch
codecs." by Ernest Jamro.

The decoder/encoder is completely parameterizable. The two main parameter
are:

DATA_BITS - The number of data bits
T - The number of bits that can be corrected

The module bch_param generates a BCH parameter set:

	`include "bch_params.vh"
	localparam [`BCH_PARAM_SZ-1:0] P = bch_params(DATA_BITS, T);

The generated parameters are passed to the BCH modules and can be access
with the following:

`BCH_M(P)		- Degree of the field
`BCH_N(P)		- Actual code size (padded)
`BCH_K(P)		- Actual data size (padded)
`BCH_T(P)		- Actual bits corrected (may be more than specified)
`BCH_DATA_BITS(P)	- Data bits
`BCH_ECC_BITS(P)	- Generate ECC bits
`BCH_CODE_BITS(P)	- Data bits + ECC bits
`BCH_SYNDROMES_SZ(P)	- Bits required for syndromes passed between modules
`BCH_SIGMA_SZ(P)	- Bits required for sigma equation passed between
			  modules
`BCH_ERR_SZ(P)		- Bits required for number of errors
`BCH_PARAM_SZ		- Bits required to hold parameters

Note that the number of errors correctable for a given polynomial is sparse.
The search function will choose the next highest number of correctable
errors rather than trying to move to the next polynomial. For instance, if
DATA_BITS=64, T=8 is seleceted, (127, 71, 9) is chosen rather than
(255, 191, 8).

Many modules accept a PIPELINE_STAGES and/or a REG_RATIO parameter. These
parameters relate to area/speed/latency optimizations. There is not a hard
and fast rule of which setting will give the best speed or area result.

PIPELINE_STAGES adds additional pipelining stages to operation, increasing
register count and latency, but possibly increasing speed.

For modules that accept input in parallel and have duplicated register
sets for dealing with the parallel input, REG_RATIO reduces the number of
registers used. REG_RATIO=1 will include all registers, REG_RATIO=2 will
calculate every other register asyncronously from the previous register,
REG_RATIO=3 will calculate every third register asyncrounously and so on.
The maximum allowable REG_RATIO is REG_RATIO=BITS (the input width).

Note that when compiling with Xilinx tools, the '-loop_iteration_limit'
option to XST is required to be set above the `BCH_N constant. Choosing
a value that is DATA_BITS is usually safe.

The code currently compiles under Icarus Verilog, Xilinx Isim and Xilinx XST.

Bit Order:

When operating with multi-bit input/output, the first bit in each sequence is
the high bit of the first word. If padding is added to the last word is a
sequence, it is added to the lowest bits of the word.

Modules:

The encoder consists of a single module, but the decoder consists of several
modules that are meant to be used together. Depending on requirements,
certain modules can be shared between decoders. Slice counts and max
frequencies are show below each module based on the Xilinx Virtex-6 LX240T-3,
built with all default options except for '-opt_level 2' passed to XST.
Additional optimizations may yield lower area/higher speeds.

bch_encode - Performs encoding of input data via a large LFSR. Takes a BITS
wide input. For the first ceil(`BCH_DATA_BITS / BITS) cycles (including
cycle 0) the output mirrors the input. Then, for ceil(`BCH_ECC_BITS / BITS)
cycles the output contains the ECC bits. In the case that `BCH_DATA_BITS %
BITS is non-zero, the low bits of the last data word will be ignored.
A pipeline stage can be added by setting the PIPELINE_STAGES=1 parameter.

T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0		12 slices	503 MHz
BITS= 8,PIPELINE_STAGES=1		18 slices	543 MHz
BITS=16,PIPELINE_STAGES=0		18 slices	543 MHz

T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0		22 slices	543 MHz

T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0		72 slices	390 MHz

T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1		125 slices	447 MHz

T=12,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1		183 slices	387 MHz

T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=0		150 slices	375 MHz
BITS=16,PIPELINE_STAGES=0		295 slices	318 MHz

bch_blank_ecc - Allows the ECC on erased flash to be valid. It does this by
precomputing the ECC of an all 1's input and then inverting it. This module
provides this data so it can be XOR'd with the ECC bit output from the encode
module and XOR'd again before input to the bch_syndrome module.

bch_syndrome - Calculates the syndrome equations. Takes a BITS wide input.
Operates for ceil(`BCH_CODE_BITS / BITS) cycles to produce the compact
syndrome equations. Note that bch_syndrome expects the data and ecc input
to be merged together without pad bits in the case of `BCH_DATA_BITS %
BITS being non-zero, which is not what bch_encode produces. It is
recommended to make BITS divisible by `BCH_DATA_BITS. In the event that
`BCH_CODE_BITS % BITS is non-zero, the upper bits of the final word are
ignored. Up to two pipeline stages can be added by setting the
PIPELINE_STAGES parameter.

T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1	8 slices	464 MHz
BITS= 4,PIPELINE_STAGES=1,REG_RATIO=4	13 slices	506 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=1	30 slices	485 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4	48 slices	408 MHz

T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4	52 slices	386 MHz

T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8	66 slices	451 MHz

T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16	43 slices	508 MHz

T=12,DATA_BITS=256:
BITS= 4,PIPELINE_STAGES=2,REG_RATIO=4	21 slices	488 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8	22 slices	512 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16	42 slices	485 MHz

T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8	25 slices	512 MHz
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16	42 slices	453 MHz

bch_errors_present - Determines based on syndromes if any errors are present
(Not required for decoding). This module can be used to allow several
bch_syndrome modules to share bch_sigma_* and bch_error_* modules or to
determine that the error is data free before allowing bch_sigma_* to complete.
Up to two pipeline stages can be added by setting the PIPELINE_STAGES
parameter.

bch_sigma_* - Key equation solvers (sigma). Takes syndrome equations as input
and produces the key equation, as well as the number of bit errors detected.
Solving the key equation is required for T > 2 and optional for T == 2, and
not supported for T == 1.

bch_sigma_bma_serial - Serial Berlekamp–Massey algorithm with inversion.
This option takes longer but requires less space than the parallel option.
This option takes `BCH_T(P) * (`BCH_M(P) + 2) - 2 cycles to solve the key
equation. Because of variability in the basis conversion circuits, the
performance can vary widely for different values of `BCH_M.

T= 2,DATA_BITS=64			63 slices	376 MHz
T= 2,DATA_BITS=256			80 slices	345 MHz
T= 2,DATA_BITS=1024			95 slices	263 MHz
T= 2,DATA_BITS=4096			120 slices	276 MHz
T= 8,DATA_BITS=256			192 slices	217 MHz
T= 8,DATA_BITS=1024			225 slices	262 MHz
T= 8,DATA_BITS=4096			272 slices	190 MHz
T=12,DATA_BITS=256			266 slices	208 MHz
T=12,DATA_BITS=1024			299 slices	205 MHz
T=12,DATA_BITS=4096			429 slices	194 MHz

bch_sigma_bma_parallel - Parallel Berlekamp–Massey algorithm without
inversion. This option operates in less cycles than the serial option, but
requires more gates. This option takes `BCH_T(P) * 2 - 1 cycles to solve the
key equation.

T= 2,DATA_BITS=64			92 slices	330 MHz
T= 2,DATA_BITS=256			138 slices	327 MHz
T= 2,DATA_BITS=1024			222 slices	284 MHz
T= 2,DATA_BITS=4096			335 slices	236 MHz
T= 8,DATA_BITS=256			444 slices	223 MHz
T= 8,DATA_BITS=1024			586 slices	251 MHz
T= 8,DATA_BITS=4096			754 slices	212 MHz
T=12,DATA_BITS=256			576 slices	230 MHz
T=12,DATA_BITS=1024			761 slices	229 MHz
T=12,DATA_BITS=4096			1025 slices	191 MHz

bch_sigma_bma_noinv - Serial Berlekamp–Massey algorithm without inversion.
This option takes `BCH_T(P) * (`BCH_M(P) * 2 + 1) cycles to solve the key
equation.

T= 3,DATA_BITS=256			19 slices	421 MHz
T= 3,DATA_BITS=1024			107 slices	348 MHz
T= 3,DATA_BITS=4096			128 slices	320 MHz
T= 8,DATA_BITS=256			19 slices	307 MHz
T= 8,DATA_BITS=1024			231 slices	261 MHz
T= 8,DATA_BITS=4096			305 slices	232 MHz
T=12,DATA_BITS=256			23 slices	489 MHz
T=12,DATA_BITS=1024			342 slices	196 MHz
T=12,DATA_BITS=4096			445 slices	203 MHz

bch_error_* - Error locator. After 2 cycles, it clocks out a BITS wide error
bits word ceil(`BCH_DATA_BITS / BITS) cycles, each cycle representing a set
of error locations. If `BCH_DATA_BITS % BITS is non-zero, the low bits
of the last word in the output are masked.

bch_error_dec - Error location function for T < 3. Takes syndromes directly
as input rather than a solved key equation. Same output as bch_error_*
above, but also produces the error_count. Up to two pipeline stages for
T==2 or one for T==1 can be added by setting the PIPELINE_STAGES parameter.

T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=1,REG_RATIO=1	17 slices	491 MHz
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=1	45 slices	510 MHz
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8	23 slices	490 MHz
BITS=16,PIPELINE_STAGES=0,REG_RATIO=8	32 slices	482 MHz

T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0,REG_RATIO=16	31 slices	466 MHz

T=1,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=0,REG_RATIO=4	88 slices	423 MHz
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16	64 slices	385 MHz

T=2,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=2,REG_RATIO=1	36 slices	442 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8	105 slices	504 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4	84 slices	488 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=1	183 slices	424 MHz

T=2,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8	367 slices	291 MHz

T=2,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=1,REG_RATIO=4	821 slices	230 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16	637 slices	215 MHz

bch_error_tmec - Error location function for T > 1. Requires the solved key
equation as input. Same output as bch_error_* above. Up to two pipeline
stages can be added by setting the PIPELINE_STAGES parameter.

T=3,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1	26 slices	485 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4	75 slices	457 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8	111 slices	393 MHz

T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4	152 slices	416 MHz

T=3,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8	314 slices	352 MHz

T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8	373 slices	337 MHz

T=8,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16	737 slices	251 MHz

T=12,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16	595 slices	297 MHz

T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8	585 slices	262 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16	1247 slices	183 MHz

sim - Test bench core and example code for connecting together the different
modules. Takes an additional parameter to specify the type of key equation
solver:

OPTION - The type of key equation solver:

OPTION == NONE - For T < 3, skip the key solving process
OPTION == PARALLEL - For T > 1, use bch_sigma_bma_parallel
OPTION == SERIAL - For T > 1, use bch_sigma_bma_serial
OPTION == NOINV - For T > 1, use bch_sigma_bma_noinv

BITS - The word width for bch_encode, bch_syndrome, and bch_error_*.

REG_RATIO - When using multi-bit datastreams, bch_syndrome and bch_error_*
create a duplicated register set for each bit. This causes them to instead
only create a register for every REG_RATIOth bit. The additional values
are calculated asyncronously which may create timing issues, but will
reduce register usage.

TODO:

Improve simulation times.

Shared syndrome/chien stages between data streams?

Forney algorithm for error location?

BTZ algorithm for error location?

Encoder Example Output:

Note: This includes XOR'ing the ECC data with bch_blank_ecc. The output
matches the ECC generated by Linux software BCH.

DATA_BITS=4096, T=3, BITS=8, GP(2^13)

0x000: 24 ac a8 48 81 7a 91 03 90 6b 21 21 c6 6a 0f 8d
0x010: b0 de 9d 60 b1 99 fb 62 a6 21 43 4d 48 4a ec 91
0x020: 80 c7 cf 00 b0 b3 eb 60 6e 8f c4 dc bc b3 b1 78
0x030: b7 ea 2b 6e 31 ef b4 62 9f 97 bf 3e 63 24 ec c7
0x040: a3 21 ef 47 55 4b c8 ab a7 ef f5 4e ee 4e 4b dd
0x050: 48 6b 4c 91 f1 db 37 e2 4f 96 0c 9e 73 69 6e e7
0x060: 17 5e 00 2f a3 72 53 47 3a 92 1a 74 bd 14 71 7b
0x070: b4 1a d5 69 a6 64 e5 4d c6 46 f7 8d 50 e1 ce a0
0x080: 01 44 dc 03 04 84 26 08 55 a2 1e aa b9 33 33 73
0x090: 69 7c 04 d3 c8 1d c1 91 b0 1c 3d 61 ca 6b e5 95
0x0a0: d6 bb f9 ac ab 54 65 57 40 52 e8 81 75 7c 6a eb
0x0b0: b8 97 ff 70 f3 4e a3 e7 c4 92 21 88 50 a6 90 a0
0x0c0: 53 50 ce a7 3c 32 8c 79 4f bd 16 9e 51 86 74 a2
0x0d0: 82 cd 23 04 1b 2c d2 37 1e cf 04 3c 91 48 51 23
0x0e0: d7 b0 a3 ae 6d a6 40 da 63 9f d2 c6 75 c8 a8 ea
0x0f0: 3d 40 ae 7b 54 bb f0 a8 d0 59 eb a1 8e 7d e9 1d
0x100: a1 82 4f 42 f2 aa 4f e4 dc 06 33 b9 5c cc 60 b8
0x110: 7e 4c c4 fd a8 d2 55 50 8f d9 9b 1e 9b d2 39 36
0x120: 46 e0 c4 8c 32 df ac 64 f9 aa 85 f2 f1 85 9d e2
0x130: 1f 95 b6 3e ed 06 b3 db f9 6a 23 f3 3a 29 be 75
0x140: da 03 d3 b5 91 ec 7b 22 92 4c 75 25 0b 7c c4 17
0x150: eb 03 7f d7 8a 3f e3 15 69 f4 84 d2 30 fe b8 60
0x160: 43 73 34 87 13 b7 ea 26 ee 5a 41 dd bc b7 4f 78
0x170: 48 db 30 90 7e 00 a0 fd b6 81 a7 6c 9c e7 9b 38
0x180: f8 f1 4b f0 cb 6f d7 97 a1 f4 9b 42 1e 11 28 3d
0x190: da 9a 5b b4 92 d2 d1 24 fc 75 4f f9 54 f5 30 a8
0x1a0: 1a c6 a8 34 61 1c fc c3 71 30 58 e3 39 4f f4 73
0x1b0: 5f 99 ac be 64 3d be c9 ce e8 69 9c c5 2d d1 8b
0x1c0: f4 15 99 e9 cd 49 4b 9b fa 6a 1b f5 22 b0 94 44
0x1d0: ef bf 1f de d3 6c d9 a7 67 58 90 cf 5a db 14 b4
0x1e0: 94 83 f3 28 b7 47 41 6f 5b 36 ca b7 a6 6d f9 4d
0x1f0: ad 47 51 5b fb 29 7d f7 9e 98 8f 3c 22 ff 8c 44
  ECC: 24 a2 6b 4d 5b

-- Russ Dill <russ.dill@asu.edu>