Sui-tool download-db-snapshot is failing multiple times with panicked at /home/runner/work/sui/sui/crates/sui-tool/src/lib.rs:1104:35:
jay-ginco opened this issue ยท 13 comments
Steps to Reproduce Issue
Spin up a new testnet server, and when trying to download db snapshot via below command
sui-tool download-db-snapshot --latest \
--network testnet --path /mnt/sui/snapshot_db \
--num-parallel-downloads 25 \
--no-sign-request
It fails after running for around 20 minutes with
thread 'main' panicked at /home/runner/work/sui/sui/crates/sui-tool/src/lib.rs:1104:35:
Task failed: Generic s3 error: error decoding response body
Caused by:
0: error decoding response body
1: error reading a body from connection
2: stream error received: unexpected internal error encountered
Expected Result
It should download the testnet snapshot with the latest epoch (541 in above case) without any failures
System Information
- OS: ubuntu 22.04
- 16 vCPU 128 GB
Workaround
I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?
Hey @johnjmartin, do you have any idea what might be going on here?
Thanks @stefan-mysten for replying, @johnjmartin any suggestions would be highly appreciated
what sui-tool
version are you running? sui-tool -V
@johnjmartin Its the binary from latest testnet 1.37.1 release, same as sui-node that is running.
also meanwhile, any inputs on this
I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region for the gcs bucket so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?
I believe tweaking --num-parallel-downloads should get it to working, checking
@johnjmartin can you also take a look at #20213 if any inputs, thanks
I believe tweaking --num-parallel-downloads should get it to working, checking
Reducing the --num-parallel-downloads can help in environments with less compute resources or less network bandwidth
I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region for the gcs bucket so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?
The data size for testnet snapshots is ~2TB at the moment. The cost will depend on if you're transferring data within google cloud or outside of it, see https://cloud.google.com/storage/docs/requester-pays for details
Thank you very much @johnjmartin
Just wanted to confirm, while the mainnet gcs snapshot data is ~ 2.7 TB, when I try to use the cloudflare endpoint
sui-tool download-db-snapshot --latest --network mainnet --path /mnt/sui --no-sign-request
[00:00:06] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 262 out of 53557 files done (Downloading file: epoch_578/store/perpetual/370668.sst, #downloads_in_progress: 14)
The files to be dowloaded are very less, for testnet the number was in millions, any reason for that?
The files to be dowloaded are very less, for testnet the number was in millions, any reason for that?
cloudflare snapshots are currently smaller than gcs snapshots as we prune the cloudflare ones more aggressively
Ah, thanks @johnjmartin for the inputs, so we need historical data and also indexing, would that be possible with the cloudflare snapshot? We want to run the node in non-prune mode (will prune transactional data, but after downloading snpashot)
Yeah the cloudflare db snapshot is fine, the pruning that's applied to it is for a small subset of tables which are essentially unused by RPC requests
Thanks @johnjmartin for all the help and responses, probably last ques if you have inputs
I am working on restoring my full node from rocksdb snapshot, and I beive the data size would be ~ 2.5 TB for testnet, and I am using this config to prune transactions: https://docs.sui.io/guides/operator/data-management#full-node-with-full-object-history-but-pruned-transaction-history
How much space can be compacted or how much time will it take to perform db compaction based on this config once the node starts, any sugesstions please
I wouldn't expect much space to be reclaimed via compaction after you restore from a DB snapshot. the snapshots that are uploaded have already been compacted. I expect the disk usage will mostly increase