A better error correcting code
vapniks opened this issue · 5 comments
Just throwing this out there in case anyone's interested.
New mathematical research finds better error correcting codes: https://www.quantamagazine.org/researchers-defeat-randomness-to-create-ideal-code-20211124/
Yeah, I saw that. I couldn't make heads or tails of it.
If you can explain it and its value to our users, I'm all ears.
Here's the preprint: https://arxiv.org/pdf/2111.04808.pdf
I don't know the details of the algorithm, but on page 2 of that paper it says it can produce, for any given code rate (proportion of data bits to total bits), good error correcting codes that are almost optimal wrt the Gilbert-Varshamov bound: https://en.wikipedia.org/wiki/Gilbert%E2%80%93Varshamov_bound
So I guess this probably means using this algorithm you could squeeze more data onto the disk than a Reed-Solomon code with the same error-correcting ability.
Sorry I can't offer any more help, but perhaps the authors of that paper might be able to assist if anyone is thinking of trying to implement it.
Reed-Solomon is optimal when it comes to error-correcting ability for a given block size. With K recovery blocks, you can always recover K or fewer errors.
And, in fact, for large K, a random code gets very very close to that bound. Turbocodes and some others are based around the idea that a random code gets very very close to that bound.
The Quanta article talks and the paper say the big difference with this code is that it is a locally testable code. I'm not sure exactly what that means. I think it means that you can look at a handful of bytes in a block and be pretty sure if it is good or bad. Or, it may mean that looking at a few bytes will tell you if the whole input set is recoverable. I'm not sure.
Thanks for letting us know about it. But I don't think we'll be including it in Parchive 3.0.