`s3-credentials get-objects` command
simonw opened this issue · 7 comments
I find myself needing to download all of the objects in an S3 bucket that match a specific path pattern.
Related:
Initial design (help first development):
Usage: s3-credentials get-objects [OPTIONS] BUCKET [KEYS]...
Download multiple objects from an S3 bucket
To download everything, run:
s3-credentials get-objects my-bucket
Files will be saved to a directory called my-bucket. Use -o dirname to save
to a different directory.
To download specific keys, list them:
s3-credentials get-objects my-bucket one.txt path/two.txt
To download files matching a glob-style pattern, use:
s3-credentials get-objects my-bucket --pattern '*/*.js'
Options:
-o, --output DIRECTORY Write to this directory instead of one matching the
bucket name
-p, --pattern TEXT Glob patterns for files to download, e.g. '*/*.js'
--access-key TEXT AWS access key ID
--secret-key TEXT AWS secret access key
--session-token TEXT AWS session token
--endpoint-url TEXT Custom endpoint URL
-a, --auth FILENAME Path to JSON/INI file containing credentials
--help Show this message and exit.
I'm going to introduce moto
to help test this - I used it in https://github.com/simonw/s3-ocr/blob/0.6.3/tests/conftest.py and it worked really well.
It's going to be a bit confusing having some tests that use moto
and others that use botocore.stub
but I think it's going to be worthwhile for the productivity boost on implementing this.
Got this working. Could do with a progress bar of some sort.
The trick with progress bars is that I know the size of the keys I am going to download in the case where I fetched a list of keys first, but I don't know the size of the keys in the case where the user specified them on the command-line.
I could run some HEAD
requests first for those I guess?
Need to support -s / --silent
for hiding the progress bar, for consistency with https://s3-credentials.readthedocs.io/en/stable/other-commands.html#put-object
Demo:
% s3-credentials get-objects static.niche-museums.com -o out -p '*gas*'
Downloading 4.3 MB (1 file) [####################################] 100%
% s3-credentials get-objects static.niche-museums.com -o out -p '*big*'
Downloading 6.6 MB (4 files) [####################################] 100%
Idea:
--skip
to skip downloading a file if it already exists with the same filename--skip-hash
to skip downloading a file if it already exists AND the MD5 hash has not changed (more expensive as needs to calculate the local hash)