dogsheep/dogsheep-photos

Upload all my photos to a secure S3 bucket

simonw opened this issue · 14 comments

  • Create a bucket with bucket credentials
  • Programmatically upload some recent photos to it (from a notebook)
  • Turn this into a script

Research thread: https://twitter.com/simonw/status/1249049694984011776

I want to build some software that lets people store their own data in their own S3 bucket, but if possible I'd like not to have to teach people the incantations needed to get their bucket setup and minimum-permission credentials figures out

https://testdriven.io/blog/storing-django-static-and-media-files-on-amazon-s3/ looks useful

I'm going to call my bucket dogsheep-photos-simon.

https://console.aws.amazon.com/s3/bucket/create?region=us-west-1

S3_Management_Console

I created it with no public read-write access. I plan to use signed URLs via a transforming proxy to access images for display on the web.

Creating IAM groups called dogsheep-photos-simon-read-write and dogsheep-photos-simon-read: https://console.aws.amazon.com/iam/home#/groups - I created them with no attached policies.

Now I can attach an "inline policy" to each one. For the read-write group I go here:

https://console.aws.amazon.com/iam/home#/groups/dogsheep-photos-simon-read-write

IAM_Management_Console

Example policies are here: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html

For the read-write one I went with:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::dogsheep-photos-simon/*"
            ]
        }
    ]
}

For the read-only policy I'm going to guess that this is appropriate:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject*",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::dogsheep-photos-simon/*"
            ]
        }
    ]
}

I tried the policy simulator to test this out: https://policysim.aws.amazon.com/home/index.jsp?#groups/dogsheep-photos-simon-read - this worked:

IAM_Policy_Simulator

Next step: create two IAM users, one for each of those groups.

https://console.aws.amazon.com/iam/home#/users$new?step=details

IAM_Management_Console

IAM_Management_Console

I copied the keys into a secure note in 1password.

Couldn't get into Transmit with them though! https://library.panic.com/transmit/transmit5/iam-roles/ may help.

I'm going to create another user just for Transmit, with full S3 access.

name: dogsheep-photos-simon-s3-all-access

Rather than creating a group for that user, I'm trying the "Attach existing policies directly" option:

IAM_Management_Console

That user DID work with Transmit. I uploaded a test HEIC image. I used Transmit to copy a signed URL for it.

~ $ curl -i 'https://dogsheep-photos-simon.s3.us-west-1.amazonaws.com/IMG_7195.HEIC?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWXFXAI...' | head -n 100
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 200 OK
x-amz-id-2: gBOCYqZfbNAnv0R/uJ++qm2NbW5SgD4TapgF9RQjzzeDIThcCz/BkKU+YoxlG4NJHlcmMgAHyh4=
x-amz-request-id: C2FE7FCC3BD53A84
Date: Sat, 18 Apr 2020 20:28:54 GMT
Last-Modified: Sat, 18 Apr 2020 20:13:49 GMT
ETag: "fe3e081239a123ef745517878c53b854"
Accept-Ranges: bytes
Content-Type: image/heic
Content-Length: 1913097
Server: AmazonS3

Next step: attempt a programmatic upload using the dogsheep-photos-simon-read-write credentials from a Jupyter notebook.

Also attempt a programmatic bucket listing and read using dogsheep-photos-simon-read credentials.

This worked!

Dogsheep_Photos_S3_access

And this worked:

Dogsheep_Photos_S3_access

But... list_objects failed for both of my keys (read and write):

Dogsheep_Photos_S3_access

How about generating a signed URL?

read_client.generate_presigned_url(
    "get_object",
    Params={
        "Bucket": "dogsheep-photos-simon",
        "Key": "this_is_fine.jpg",
    },
    ExpiresIn=600
)

Gave me https://dogsheep-photos-simon.s3.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398

Which does this:

~ $ curl -i 'https://dogsheep-photos-simon.s3.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398'
HTTP/1.1 307 Temporary Redirect
x-amz-bucket-region: us-west-1
x-amz-request-id: E78CD859AEE21D33
x-amz-id-2: 648mx+1+YSGga7NDOU7Q6isfsKnEPWOLC+DI4+x2o9FCc6pSCdIaoHJUbFMI8Vsuh1ADtx46ymU=
Location: https://dogsheep-photos-simon.s3-us-west-1.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sat, 18 Apr 2020 20:47:21 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>TemporaryRedirect</Code><Message>Please re-send this request to the specified temporary endpoint. Continue to use the original request endpoint for future requests.</Message><Endpoint>dogsheep-photos-simon.s3-us-west-1.amazonaws.com</Endpoint><Bucket>dogsheep-photos-simon</Bucket><RequestId>E78CD859AEE21D33</RequestId><HostId>648mx+1+YSGga7NDOU7Q6isfsKnEPWOLC+DI4+x2o9FCc6pSCdIaoHJUbFMI8Vsuh1ADtx46ymU=</HostId></Error>~ $ 

So it redirects to another URL... which returns this:

~ $ curl -i 'https://dogsheep-photos-simon.s3-us-west-1.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398'
HTTP/1.1 200 OK
x-amz-id-2: XafOl6mswj3yz0GJC9+Ptot1ll5sROVwqsMc10CUUfgpaUANTdIx2GhnONb5d1GVFJ6wlS2j3UY=
x-amz-request-id: 258387C180411AFE
Date: Sat, 18 Apr 2020 20:47:52 GMT
Last-Modified: Sat, 18 Apr 2020 20:37:35 GMT
ETag: "ee04081c3182a44a1c6944e94012e977"
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 53072
Server: AmazonS3

????JFIF??C

So that worked! It did come back with Content-Type: binary/octet-stream though.

Running the upload again like this resulted in the correct content-type:

client.upload_file(
    "/Users/simonw/Desktop/this_is_fine.jpg",
    "dogsheep-photos-simon",
    "this_is_fine.jpg",
    ExtraArgs={
        "ContentType": "image/jpeg"
    }
)

This is great! I now have a key that can upload photos, and a separate key that can download photos OR generate signed URLs to access those photos.

Next step: a script that starts uploading my photos.

I'm going to start with this:

photos-to-sqlite upload photos.db ~/path/to/directory

This will scan the provided directory (and all sub-directories) for image files. It will then:

  • Calculate a sha256 of the contents of that file
  • Upload the file to a key that's sha256.jpg or .heic
  • Upload a sha256.json file with the original path to the image
  • Add that image to a uploads table in photos.db

Stretch goal: grab the EXIF data and include that in the .json upload AND the uploads database table.

Got this working! I'll do EXIF in a separate ticket #3.