Suggestion: Use reed-solomon encoding to reduce data size
Hiddendoom45 opened this issue · 1 comments
The current implementation of the program appears to copy the encrypted file into each horcrux, which becomes pretty storage intensive with a larger file and more horcruxes.
n is the number of horcruxes, t the number needed to ressurect, s is size of the file, t ≤ n
Reed-solomon encoding can split the original file into n pieces, each piece has a size of s/t. Any combination of t pieces can be used to recreate the original file.
So the total space used by the horcruxes would be n*(s/t) instead of n*s.
This should also ensure the integrity of the data as the reconstruction of the file should fail if a piece is modified.
A possible solution is to use par2: https://github.com/Parchive/par2cmdline
With the current master branch version, you may create redundancy > 100%.
Example: par2 create -r167 -u -n5 diary.txt.par2 diary.txt
, or any -r
larger than 100/t*n
but smaller than 100/t*(n+1)
.
It works as a rough replacement of horcrux -n 5 -t 3 split diary.txt
but you will get smaller files you want.
Then simply delete the extra diary.txt.par2
file, keeping only diary.txt.volXXXXX.par2
files.
As a result, you may delete any 2 files, the remaining 3 files can still restore your diary.txt
safely.
However, please DO NOTE THAT REED-SOLOMON IS NOT SECURE AND YOU NEED TO ENCRYPT YOUR DIARY BEFORE SPLITTING WITH A PASSWORD.