simonw/s3-credentials

`s3-credentials get-objects` command

simonw opened this issue · 7 comments

I find myself needing to download all of the objects in an S3 bucket that match a specific path pattern.

Related:

Initial design (help first development):

Usage: s3-credentials get-objects [OPTIONS] BUCKET [KEYS]...

  Download multiple objects from an S3 bucket

  To download everything, run:

      s3-credentials get-objects my-bucket

  Files will be saved to a directory called my-bucket. Use -o dirname to save
  to a different directory.

  To download specific keys, list them:

      s3-credentials get-objects my-bucket one.txt path/two.txt

  To download files matching a glob-style pattern, use:

      s3-credentials get-objects my-bucket --pattern '*/*.js'

Options:
  -o, --output DIRECTORY  Write to this directory instead of one matching the
                          bucket name
  -p, --pattern TEXT      Glob patterns for files to download, e.g. '*/*.js'
  --access-key TEXT       AWS access key ID
  --secret-key TEXT       AWS secret access key
  --session-token TEXT    AWS session token
  --endpoint-url TEXT     Custom endpoint URL
  -a, --auth FILENAME     Path to JSON/INI file containing credentials
  --help                  Show this message and exit.

I'm going to introduce moto to help test this - I used it in https://github.com/simonw/s3-ocr/blob/0.6.3/tests/conftest.py and it worked really well.

It's going to be a bit confusing having some tests that use moto and others that use botocore.stub but I think it's going to be worthwhile for the productivity boost on implementing this.

Got this working. Could do with a progress bar of some sort.

The trick with progress bars is that I know the size of the keys I am going to download in the case where I fetched a list of keys first, but I don't know the size of the keys in the case where the user specified them on the command-line.

I could run some HEAD requests first for those I guess?

Need to support -s / --silent for hiding the progress bar, for consistency with https://s3-credentials.readthedocs.io/en/stable/other-commands.html#put-object

Demo:

% s3-credentials get-objects static.niche-museums.com -o out -p '*gas*'
Downloading 4.3 MB (1 file)  [####################################]  100%
% s3-credentials get-objects static.niche-museums.com -o out -p '*big*'
Downloading 6.6 MB (4 files)  [####################################]  100%          

Idea:

  • --skip to skip downloading a file if it already exists with the same filename
  • --skip-hash to skip downloading a file if it already exists AND the MD5 hash has not changed (more expensive as needs to calculate the local hash)