Backup restore is very slow

Question

Backup restore is very slow

jesulo opened this issue 15 days ago · 4 comments

I'm doing a backup restore on a ct that weighs 500gb, but only has 80gb occupied. When the backup is on the local disk it takes 3 and a half hours, but when it is on pbs it takes 7 hours. Because in both cases it takes a long time. Is there a way to reduce the time? They are on zfs with linstor. Regards

Answer 1 · 2024-10-21T08:47:20.000Z

I am not sure what you are actually looking for?

What is "ct", what is "pbs"?

What do you mean with "When the backup is on the local disk it takes 3 and a half hours"? When you already have the backup locally available, restoring the backup (or rather the snapshot) into a new LINSTOR-resource should only take a few seconds, not 3.5h.

What is the download-speed of the satellite that downloads the backup? What would be the time you would expect for 80GB to be downloaded (and why)?

Answer 2 · 2024-10-21T11:05:20.000Z

I mean an lxc container or a proxmox vm.
Pbs is the proxmox backup server.
Yes, the restoration of the container backup takes 3 and a half hours on a local disk and when I do it from the pbs it takes longer. What settings should I make so that it doesn't take so long?
How do I see the download speed? In the restore log it says that the restore speed was 5 Mb. Maybe it's because I used zfs? Or for HA replication?

Answer 3 · 2024-10-21T14:38:19.000Z

If you are restoring from proxmox backup server, I assume the data is getting copied and possibly sent to the other peers via DRBD.

This is more of a performance tuning question than an actual bug, so I would suggest that you do some testing. I.e. try to restore a resource into a resource that has only 1 replica. The idea is that regardless if you have DRBD configured or not, if there are no other diskful DRBD peers, the restore-operation will not depend on your network speed. If this test is much faster than what you have right now, you will want to investigate further into network optimizations and DRBD tuning (for example https://kb.linbit.com/tuning-drbds-resync-controller, but feel free to further google).
If the results are someone similar to what you have right now, the network is not a problem. I would doubt that DRBD would be an issue with local writes, so my next guess is to check your storage speed by restoring into a storage-only resource. If that is also slow, it depends on your setup where to continue the investigation. If you are using VMs, check how the disk-IO is mapped from the virtual machine to the physical hardware and see if you can optimize things there.

From what you have said until now, this does not look like an issue with LINSTOR at all, since LINSTOR is not even in the IO path in these use-cases. My guess is that the bottleneck is either your network's or your storage's speed (check both, the reading as well as the writing storage).

Answer 4 · 2024-10-22T00:36:51.000Z

I modified rs-discard-granularity to 1M, but the slowness continues. I've noticed that the I/O is very high; when restoring, it even impacts other virtual machines on the same disk. Could you tell me what configurations I could apply so that replication with the other node doesn't affect I/O too much? Can it be configured as asynchronous or lower the priority of replication? What properties do you recommend that I modify? Thanks.