/aws-s3-downloader

Download all files and XML list in a public Amazon AWS S3 bucket.

Primary LanguagePythonMIT LicenseMIT

Purpose

Provide a command line ability to download some, or all, of the public/authorized users files in an AWS S3 bucket as well as all of the XML that lists its contents, whether the key is public or not.

Requirements

Python3 - Thanks to dreamflasher (https://github.com/dreamflasher) for migrating from Python2 to 3!

Reason

As I was going through, looking for public AWS S3 buckets that contained PII, I realized that I wanted to be able to download the XML and a subset of data to show companies what data they had exposed. I didn't want to do this manually and I wanted to be able to have ALL of the XML (AWS paginates S3 content per 1k keys).

Use

Just get the XML, downloaded to the working directory under a the subfolder [bucket_name]
./download_bucket.py -n [bucket_name] -x

Download the whole bucket to /home/foo/bar/[bucket_name]
./download_bucket.py -o /home/foo/bar -n [bucket_name] -d

Download only where "test" is in the key and get all of the XML
./download_bucket.py -n [bucket_name] -d -x -i test

Download where "test" is in the key but "exclude me" is not in the key
./download_bucket.py -n [bucket_name] -d -i test -e "exclude me"

Download everything starting after thisfile.txt on public readable downloads, e.g. if you don't want to paginate through again
./download_bucket.py -n [bucket_name] -d --last_key "thisfile.txt"

Download using an API key (e.g. for buckets that allow any authenticated user to access it) ./download_bucket.py -n [bucket_name] -d -ak "AWS_ACCESS_KEY" -sk "AWS_SECRET_KEY"

Notes

  • If a file is private, the download will be the XML saying that file access is denies
  • Some keys are just folder names, these will not be downloaded but the keys within the bucket will (e.g. a key could be "folder/" but there will be keys with content like "folder/file1")
  • You can add multiple "-i" or "-e" parameters. Each set of "-i" and "-e" parameters will be OR'd and the "-i" and "-e" parameters are AND'd together. These are case insensitive