Parchive/par2cmdline

[Question] Understanding The Par2 Spec and par2cmdline

Closed this issue · 2 comments

I'm writing my own implementation of the Par2 spec. I've read the spec and I'm reading par2cmdline's source code, and I have a couple of questions:

How is a block related to a slice. The spec says slices are 16-bits and to make one recovery slice I need two input slices. But looking at the source code, blocks can clearly be bigger than 16-bits. Is it something like an input block has x number of slices and a recovery block has y number of slices? It is just not clear to me from spec, as written, what the heck it wants.

This brings me to my second question, what is the purpose of the exponent in the Recovery Slice packet, and how is it used? I have read the paper the spec links to which I barely understood. I have also read Reed-Solomon codes for Coders which I found much more understandable. But I'm still mystified by the exponent. Is it a value for a lookup table? That's my only guess.

The term 'block' and 'slice' are generally used interchangeably. Blocks/slices can be any size up to 2^64, as long as it's a multiple of 4. Slice indices (sometimes referred to as "constants" in the specification) must be 16-bit, not the slices themselves.

Within a slice, you have some number of 16-bit integers, which we'll refer to as "words". Computation is performed by transforming 1 or more words, from input slices, into 1 or more words, which end up in recovery slices.

Note that your assumption that you need two input slices to produce a recovery slice is incorrect. You need at least 1 input slice to compute 1 or more recovery slices (to put it another way, you could compute 3 recovery slices from 1 input slice, or 2 recovery slices from 5 input slices etc).

All input/recovery slices/blocks must be the same size. If the input slice is too small (due to the file ending), it needs to be padded with null bytes.

PAR2 uses a Vandermonde matrix when computing recovery data from input data. The computation, for each word w in recovery slice r, is something like:

recovery[r][w] = input[0][w] * 2^r + input[1][w] * 4^r + input[2][w] * 16^r + input[3][w] * 128^r ...

I'm not knowledgeable enough on the mathematical principles behind the Vandermonde matrix used, but the exponent is basically just a part of the calculation. It doesn't have anything particularly to do with a lookup table, however, a common performance-oriented implementation of exponentiation in GF(2^n ) is done by computing logarithm/exponentiation via a lookup table.

Thank you @animetosho.