garethgeorge/backrest

Support running backups in parallel

Opened this issue · 5 comments

Is your feature request related to a problem? Please describe.
Backrest doesn't seem to support running multiple backups simultaneously. Since restic itself supports running backup without exclusively locking the repo, I see no reason why backrest wouldn't.

Describe the solution you'd like
A config option that limits parallel backups either globally or on a per-repo basis.

Hmm, thinking through this I can see that the idea of parallel backup options is immediately appealing as it sounds like it should be a speedup -- but I'm interested to think through what the real performance gains will be (and what costs / risks if any may come with concurrent backups):

Things that jump out to me are

  • Parallel backups aren't actually a speedup if you are network IO limited e.g. by your upload speed.
  • Parallel backups aren't actually a speedup if you are disk IO limited e.g. all of the repos you are scanning are on a single HDD.
  • Parallel backups aren't always safe with a concurrent prune operation (though restic locks will typically block this case). If parallelism is added it should be single-concurrency on a given target repository, parallelism across repos.

I see parallelism as introducing some risks and backrest does aim to be opinionated in places where it can avoid "footguns" e.g. places where a user can accidentally break themself or simply may not stand to gain.

I'm curious how much value it adds to the way you use backrest / how much speedup you're expecting? I need some convincing that there's a strong value add from parallelism & that it'll be a big UX improvement to justify the complexity and risk.

You make some valid points, but I still believe that in some cases it would be beneficial to run backups in parallel.

For example:
I am running backrest on a Server with a 10G Uplink. I have it connected to 2 repos, each with a 1G Link.

Every 15 minutes I run a backup of some smaller files to repo 1.
Once a day I back up my larger files to repo 2, this usually takes up to an hour.

While the daily backup to repo 2 is running, backrest doesn't start any of the 4 scheduled backups to repo 1.

An option to run backups to different repos independently of each other would be great to have here.

Parallel backups generally will be a speedup. I am running a backup and the bottleneck is the bandwidth+latency of the S3 service, not my upload speed.

Additionally, it would be nice to be able to pause a backup task.

OK, I'm open to this but I think I'm going to consider it low priority for now -- I think my near term focus is continuing to improve, reliability, unblock some workflows by improving hook handling, and add multi-host management as a feature. I'll defer parallel execution for now as it makes some of that more challenging. Looking forward though, architecturally Backrest does have the right concurrency controls in place to make this possible both on the backend and on the networking side.

Additionally, it would be nice to be able to pause a backup task.

Unfortunately restic doesn't support pausing operations BUT it does do content-based hashing for deduplication. If you restart a backup you won't end up using more storage in your repo (/ in many cases I suspect you won't reupload anything? But that might be a question for the restic forum :) ).

I'm with you on that. Thanks for considering it, and thanks for this great piece of software :)