7z decompression is single threaded
Closed this issue · 4 comments
Working with Tobias here (https://twitter.com/divyekapoor/status/1444924480552255491) to figure out:
a) Whether we can host the ledger on a CDN instead of Yandex.cloud here (for download speeds > 15 MB/s = 125Mbps) - https://disk.yandex.com/d/fcZgyES73Jzj5T
b) Whether we can use a migz format so that the decompression is fast. (https://github.com/linkedin/migz)
I tried a couple of times to do this, each time:
a) The download of the 7z file took > 30 mins. It was network capped at 15 MB/s (~120Mbps) which is the max performance of a single thread.
b) Even after the download, the file took a really long time to decompress - it was capped to ~15 MB/s due to single threaded decompression. (see screenshot 1 - single thread pinned and screenshot 2 - 15 MB/s reads and 24 MB/s writes - max decompression rate).
The overall process took > 2-3 hours since I tried to figure out what was wrong.
Happy to provide sponsorship to make this happen.
If we want to do this without a lot of effort, maybe we can split the 30G ledger into 3G subparts and download all in parallel and decompress in parallel?
I can also understand if the answer is:
-> We're already optimizing several hours worth of syncs into a single download and doing this is not worth the time.
Sounds interesting. I could take a look at the details during the weekend.
If we are about to need a different compression format, we'll probably have to contact NF as they are currently hosting the files. As Tobias said, we are only redirecting to them.
Got it. For what it's worth, I got the node up and running http://nanocrypto.in . And given that a new node comes online pretty rarely, we can just close this out. Thanks for taking the time to build this project @lephleg - it saved a bunch of time. Could you share your nano address - please let me buy you a coffee?
The doge guys have an easier approach - they just host the file on a torrent. I downloaded 45G with an average throughput of 50-70 MB/s.