Parchive/par2cmdline

A better error correcting code

vapniks opened this issue · 5 comments

Just throwing this out there in case anyone's interested.
New mathematical research finds better error correcting codes: https://www.quantamagazine.org/researchers-defeat-randomness-to-create-ideal-code-20211124/

Yeah, I saw that. I couldn't make heads or tails of it.

If you can explain it and its value to our users, I'm all ears.

@vapniks If you can explain it and its value to our users, I'm all ears.

I don't know the details of the algorithm, but on page 2 of that paper it says it can produce, for any given code rate (proportion of data bits to total bits), good error correcting codes that are almost optimal wrt the Gilbert-Varshamov bound: https://en.wikipedia.org/wiki/Gilbert%E2%80%93Varshamov_bound
So I guess this probably means using this algorithm you could squeeze more data onto the disk than a Reed-Solomon code with the same error-correcting ability.
Sorry I can't offer any more help, but perhaps the authors of that paper might be able to assist if anyone is thinking of trying to implement it.

Reed-Solomon is optimal when it comes to error-correcting ability for a given block size. With K recovery blocks, you can always recover K or fewer errors.

And, in fact, for large K, a random code gets very very close to that bound. Turbocodes and some others are based around the idea that a random code gets very very close to that bound.

The Quanta article talks and the paper say the big difference with this code is that it is a locally testable code. I'm not sure exactly what that means. I think it means that you can look at a handful of bytes in a block and be pretty sure if it is good or bad. Or, it may mean that looking at a few bytes will tell you if the whole input set is recoverable. I'm not sure.

Thanks for letting us know about it. But I don't think we'll be including it in Parchive 3.0.