MystenLabs/sui

Sui-tool download-db-snapshot is failing multiple times with panicked at /home/runner/work/sui/sui/crates/sui-tool/src/lib.rs:1104:35:

jay-ginco opened this issue ยท 13 comments

Steps to Reproduce Issue

Spin up a new testnet server, and when trying to download db snapshot via below command

sui-tool download-db-snapshot --latest \
    --network testnet  --path /mnt/sui/snapshot_db \
    --num-parallel-downloads 25 \
    --no-sign-request

It fails after running for around 20 minutes with

thread 'main' panicked at /home/runner/work/sui/sui/crates/sui-tool/src/lib.rs:1104:35:
Task failed: Generic s3 error: error decoding response body
Caused by:
  0: error decoding response body
  1: error reading a body from connection
  2: stream error received: unexpected internal error encountered

Expected Result

It should download the testnet snapshot with the latest epoch (541 in above case) without any failures

System Information

  • OS: ubuntu 22.04
  • 16 vCPU 128 GB

Workaround

I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?

Hey @johnjmartin, do you have any idea what might be going on here?

Thanks @stefan-mysten for replying, @johnjmartin any suggestions would be highly appreciated

what sui-tool version are you running? sui-tool -V

@johnjmartin Its the binary from latest testnet 1.37.1 release, same as sui-node that is running.

also meanwhile, any inputs on this

I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region for the gcs bucket so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?

I believe tweaking --num-parallel-downloads should get it to working, checking

@johnjmartin can you also take a look at #20213 if any inputs, thanks

I believe tweaking --num-parallel-downloads should get it to working, checking

Reducing the --num-parallel-downloads can help in environments with less compute resources or less network bandwidth

I observed there are GCS/s3 buckets hosted also for snapshots as mentioned here https://docs.sui.io/guides/operator/snapshots#bucket-names, kindly mention the cost estimation if I were to download snapshot from there, or please mention the bucket type (standara/archive) or the region for the gcs bucket so I can calculate on my own. What would be the data size also, is it more than 1 TB for testnet?

The data size for testnet snapshots is ~2TB at the moment. The cost will depend on if you're transferring data within google cloud or outside of it, see https://cloud.google.com/storage/docs/requester-pays for details

Thank you very much @johnjmartin

Just wanted to confirm, while the mainnet gcs snapshot data is ~ 2.7 TB, when I try to use the cloudflare endpoint

sui-tool download-db-snapshot --latest --network mainnet --path /mnt/sui --no-sign-request
[00:00:06] โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 262 out of 53557 files done (Downloading file: epoch_578/store/perpetual/370668.sst, #downloads_in_progress: 14)

The files to be dowloaded are very less, for testnet the number was in millions, any reason for that?

The files to be dowloaded are very less, for testnet the number was in millions, any reason for that?

cloudflare snapshots are currently smaller than gcs snapshots as we prune the cloudflare ones more aggressively

Ah, thanks @johnjmartin for the inputs, so we need historical data and also indexing, would that be possible with the cloudflare snapshot? We want to run the node in non-prune mode (will prune transactional data, but after downloading snpashot)

Yeah the cloudflare db snapshot is fine, the pruning that's applied to it is for a small subset of tables which are essentially unused by RPC requests

Thanks @johnjmartin for all the help and responses, probably last ques if you have inputs

I am working on restoring my full node from rocksdb snapshot, and I beive the data size would be ~ 2.5 TB for testnet, and I am using this config to prune transactions: https://docs.sui.io/guides/operator/data-management#full-node-with-full-object-history-but-pruned-transaction-history

How much space can be compacted or how much time will it take to perform db compaction based on this config once the node starts, any sugesstions please

I wouldn't expect much space to be reclaimed via compaction after you restore from a DB snapshot. the snapshots that are uploaded have already been compacted. I expect the disk usage will mostly increase