The repository contains the code and model to deploy on AWS EC2 to predict if a set of images from Serengeti Dataset is blank (no animals visible) or non-blank (animals visible on the picture). It's a proof of concept model, which aim to exclude blank images, which bring no information regarding animal behaviour research. In my blog post you can find more about data exploration. I described the process of training model in the Kaggle notebook.
I hope that this repository might be also helpful as example respository to Image Recognition with AWS using Docker, Pytorch and Boto3.
- AWS EC2 instance (setup instruction here, from Hosting the docker container on an AWS ec2 instance):
- Deep Learning AMI (Ubuntu 16.04)
- p2.xlarge
- Public DNS automatically set up when the instance is started (in the Step 3 of launching machine choose Enable in Auto-assign Public IP field)
- HTTP traffic on port 80
- AWS EC2 Key Pair saved in your computer (also in the instruction).
- S3 Bucket named serengeti-images, containing folder test_images. You can place images to predict (from the Serengeti Dataset) inside the folder.
- Clone the repo:
git clone https://github.com/SylwiaOliwia2/Serengeti_AWS_Prediction.git
- Start the EC2 instance.
- SSH to it using Public DNS and saved Permission Key:
ssh -i </home/your-permission-key-location.pem> ubuntu@<your-public-DNS>
- In the AWS console:
mkdir deploy_folder mkdir deploy_folder/static mkdir deploy_folder/templates
- Open new console and copy files from your machine to the AWS instance (replace the angle bracket strings with proper values):
If the Dockerfile will be located directly in
scp -i </home/your-permission-key-location.pem> * ubuntu@<your-public-DNS>:/home/ubuntu/deploy_folder scp -i </home/your-permission-key-location.pem> static/* ubuntu@<your-public-DNS>:/home/ubuntu/deploy_folder/static scp -i </home/your-permission-key-location.pem> templates/* ubuntu@<your-public-DNS>:/home/ubuntu/deploy_folder/templates
/home/ubuntu
(instead of deploy_folder), the Docker will copy all the files in/home
including preinstalled libraries and environments which is useless and takes ages. - In the AWS console:
cd deploy_folder
- In the AWS console:
docker build -t app-serengeti .
- In the AWS console:
The console shoud say that the app is running on 0.0.0.5000.
docker run --gpus all -p 80:5000 app-serengeti .
- In the browser open your public DNS link. In the window you will see simple GUI with the Predict button. Press it. The AWS console should display progress in predicting images. Predicting ~40000 images resized to 500x500px took me ~2,5hours.
- The result will be saved directly in S3 bucket serengeti-images as output_blank_non_test.csv with the following columns:
- filenames (csv index)
- label - predicted label: 0 (blank) or 1 (non-blank)
- blank_proba - probability if an image being blank
- non_blank_proba - probability if an image being non_blank
Add safer method of authentication
According to AWS it's safer to create IAM role and create Temporary Security Credencials for the role, with restricted permissions. I tried to follow the instruction but it seems to be outdated.