Use S3 API to access snapshot
voron opened this issue · 1 comments
voron commented
It's a feature request (kinda addition to #260) to make bootstrap from snapshot a lot easier for users. The idea is the following
- use S3 APi instead of HTTP
- Cloudflare R2 allows to create read-only user
- access unarchived datadir content ( geth dir content basically) instead of single archive file
Pros:
- no archive - no double-space requirement
wget | tar
is non-starter with 2.4TB archive, any reconnect and you have to start from scratch- matters with bare metal servers, it's a bit tricky to get double space for single use task
- use of s3-optimized tools to boost performance like s5cmd
- aria2c is good, but it requires single file to proceed
- s5cmd may be used to boost upload performance, with or without multipart uploads
- on-the-fly checksum verification to ensure integrity
- no archive - incremental sync-up is possible, download changed objects only, not the whole datadir
- a quick way for node ops to catch up a dated node or continue the download using a fresh snapshot source
- it's tricky to do the same with uploads, as well-known/exposed directory has to be in consistent state at any time, thus no benefits here
Cons:
- increased billing
- One S3 sync estimate is 1 class A op + 0.1M class B op with PBSS datadir (~50k files), making every full sync like $0.036 after free teer.
- R2 data store increase
- snapshot compression ratio is low, it's like 200GB per snapshot, ~$3/month
- expose access key and secret key to public
- it's read-only though
- it may be rotated once in a couple months in case of abuse
PS: I'm not talking about hash-based schema with 500k+ files, it's going to be deprecated anyway. Testnet snapshot may be small enough to make wget|tar
to work in most cases also.
zzzckck commented
Thanks for you feedback, we may not use the S3 API tool, as:
1.Cost increase as you mentioned. There would have lots of files to be upload/download, cost could be much higher then one single large file.
2.Performance may not good, although "s3-optimized tools" could have good performance
Maybe we can provide a tool to improve UX, like "double-space issue"