borgbackup/backupdata

consider merging with others

anarcat opened this issue · 3 comments

restic has a similar project, called fakedatafs, with similar goals. fakedatafs works as a FUSE filesystem to provide a dataset to benchmark backup software.

obnam has its own tool to generate backup data called... genbackupdata. the design is similar to borg's, but like fakedatafs, it is also deterministic.

i was wondering if any thought has been given to reviewing or even merging with those projects instead of maintaining (now at least) 3 different projects in parallel?

if we want a standardized test suite for backup software, it seems to me we should aim at standardizing the corpus generation as well! :)

a similar issue was opened against the restic project in restic/fakedatafs#3

genbackupdata: py27, 4/5 links 404, uses pycrypto, not on github and other NIH.

fakedatafs: nice approach, but some open questions, see there (primarily how to make this realistic).

I agree, it would be nice to have one tool. Preferably something one can just install from the usual linux distributions (not having to deal with any python or go setup).

funny you should say "NIH" - genbackupdata is at least 6 years old, and therefore predates the creation of all other software here, including borg and restic themselves.

i've notified the author about the 404 links.

i am not sure how pycrypto and py27 are significant issues...

regarding "linux distributions", you will be happy to know that genbackupdata is also in Debian, at least since 2011. :)

that said, i'd be happy to deal with the Debian packaging hassle if that's too much trouble for the chosen tool - but it seems to me someone already went through a lot of that trouble for us, and we should, maybe, respect that...

NIH: not on pypi, not on github (where's the issue tracker?), using strange tools, ...

pycrypto seems dead. it seems to only use pycrypto for arc4 cipher, to generate random data (could just read from urandom for that).

py27-only seems also not very future-proof (not an immediate issue though).

also: how "realistic" is this?

the reason why I wrote backupdata was to have realistic input data, not just zeros and pure random.