Restore performance
abbbi opened this issue · 1 comments
hi,
(maybe enable discussions and convert this to an discussion)
See past discussion in this commit:
I did some testing with an amazon S3 today, just to get a impression on how things are going during restore. My setup as follows:
- Synchronious GBit uplink (deutsche telekom), speed test for download:
time curl http://speedtest.belwue.net/10G -o /dev/null
real 1m31.900s
- stock aws bucket, no special settings, Europe (Frankfurt) eu-central-1 (client munich)
- 10 GB of mixed data.
- Dynamic index backup via proxmox-backup-client
Initial backup performance is OK.
root.pxar: had to backup 9.007 GiB of 11.396 GiB (compressed 6.39 GiB) in 73.12 s (average 126.139 MiB/s)
Restore performance:
time sudo -E proxmox-backup-client restore "host/cefix/2024-08-21T06:03:30Z" root.pxar /home/abi/source/ --repository xx@pbs@127.0.0.1:pmxtest
real 6m12.991s
26.81 MB/s
Whats really beeing faster is using the pull mechanism to pull an remote s3 store to a local PBS using the PR #48
Syncing datastore 'pmxtest', namespace 'Root' into datastore 'test2', namespace 'Root'
found 1 groups to sync (out of 1 total)
sync snapshot host/cefix/2024-08-21T06:02:04Z
sync archive root.pxar.didx
downloaded 6.39 GiB (100.94 MiB/s)
sync archive catalog.pcat1.didx
downloaded 99.3 KiB (1.239 MiB/s)
[..]
TASK OK
real 1m3.299s
The current state is:
- Both the PVE restore and proxmox-backup-client restore seems to be largely sequential in requesting the chunks, (see: https://bugzilla.proxmox.com/show_bug.cgi?id=3163)
- The pull mechanism is actually implemented beeng async, thus its alot faster.
Ideas on how to improve this in the proxy:
- If an client requests the index, the proxy could before returning it already parse the index and pre-fetch required chunks async in the background
- local cache for most-referenced chunks?
Ideal imho is make a map of what chunk is next to another one , along with top N most used chunks
And always fetch in background next, while current is being used
Will use some ram, but we are in 2024, ram is cheaper than time , and still can be made an option anyway