andrewchambers/bupstash

kind bupstash and other tools benchmark report

deajan opened this issue · 7 comments

Hello,

I'm currently doing benchmarks for deduplication backup tools, including bupstash.
I decided to write a script that would:

  • Install the backup programs
  • Prepare the source server
  • Prepare local targets / remote targets
  • Run backup and restore benchmarks
  • Use public available data (linux kernel sources as git repo) and checkout various git tags to simulate user changes in the dataset

The idea of the script would be to have reproductible results, the only changing factor being the machine specs & network link between sources and targets.

So far, I've run two sets of benchmarks, each done locally and remotely.
You can find the results at https://github.com/deajan/backup-bench

I'd love you to review the recipe I used for bupstash, and perhaps guide me on what parameters to use to get maximum performance.
Any remarks / ideas / PRs are welcome.

I've also made a comparaison table of some features of the backup solutions I'm benchmarking.
I still miss some informations for some of the backup programs.
Would you mind having a look at the comparaison table and fill the question marks related to the features of bupstash ?
Also, if bupstash has an interesting feature I didn't list, I'll be happy to extend the comparaison.

PS: I'm trying to be as unbiased as possible when it comes to those benchmarks, please forgive me if I didn't treat your program with the parameters it deserves.

Also, I've created the same issue in every git repo of the backup tools I'm testing, so every author / team / community member can judge / improve the instructions for better benchmarking.

fwiw the next release of bupstash is going to add multi threading which can dramatically improve put times - for me 3x in some cases

Great news. I'm eager to make a new round of benchmarks.
Do you have any ETA for the release ?

I think within the next 2 weeks, I can ping here again when its out.

Thanks. I'll stay tuned.

Btw, bupstash is already the winner when it comes to backup speeds in my benchmarks. I'll be thrilled to see if by what factor put performance will be improved.

The only culprit I see is the restoration speed when using remote repositories, which is 40x slower than local restorations.
While restoration is going on, both the remote repository and the restoration target server's cpu and disks don't go over 10% usage.

Already tried to optimize my ssh connection (using ssh arguments -o Compression=no -c chacha20-poly1305@openssh.com -x -T where chacha20-poly1305 is fasted on my repository server).
Regardless of that optimization which makes no more than 5 seconds gains on 170 seconds restoration process, bupstash restore operations are absolutely not maxing out hardware. Any ideas perhaps ?

In general restore has had less performance work put into it - will definitely be looking at it in the future.

@deajan looks like you were expecting a ping for https://github.com/andrewchambers/bupstash/releases/tag/v0.12.0 but didn't get one, I'd be curious to see how this release changes your benchmark.