jesseduffield/horcrux

Suggestion: Use reed-solomon encoding to reduce data size

Hiddendoom45 opened this issue · 1 comments

The current implementation of the program appears to copy the encrypted file into each horcrux, which becomes pretty storage intensive with a larger file and more horcruxes.

n is the number of horcruxes, t the number needed to ressurect, s is size of the file, t ≤ n
Reed-solomon encoding can split the original file into n pieces, each piece has a size of s/t. Any combination of t pieces can be used to recreate the original file.

So the total space used by the horcruxes would be n*(s/t) instead of n*s.

This should also ensure the integrity of the data as the reconstruction of the file should fail if a piece is modified.

A possible solution is to use par2: https://github.com/Parchive/par2cmdline

With the current master branch version, you may create redundancy > 100%.

Example: par2 create -r167 -u -n5 diary.txt.par2 diary.txt, or any -r larger than 100/t*n but smaller than 100/t*(n+1).

It works as a rough replacement of horcrux -n 5 -t 3 split diary.txt but you will get smaller files you want.

Then simply delete the extra diary.txt.par2 file, keeping only diary.txt.volXXXXX.par2 files.

As a result, you may delete any 2 files, the remaining 3 files can still restore your diary.txt safely.

However, please DO NOTE THAT REED-SOLOMON IS NOT SECURE AND YOU NEED TO ENCRYPT YOUR DIARY BEFORE SPLITTING WITH A PASSWORD.