This repository provides code for deployment of an image face reconstruction web application. It showcases the development of a cloud native application build in a DevOps environment. Its deployment is configured by means of a CirclCi pipeline. This repository contains code as part of the Udacity Cloud DevOps Engineer Nanodegree Program capstone project.
Users can upload a picture they wish to reconstruct. After users upload their picture they are prompted to wait by a loading screen. In the backend, the file is renamed to a random name (randomly generated hex string) and is uploaded to an S3 bucket. The upload to the S3 bucket triggers an AWS Lambda function. The picture then runs through an inference model that attempts to properly reconstruct faces on the image (e.g. sharpen contours). The reconstructed picture is returned to the user. The deployment of the code and resources needed are described in the next sections.
INFO: Make sure to install Git LFS before cloning this repository. Failing to do so will lead to errors due to missing files. Refer to the requirements section for more information regarding Git LFS.
The .circleci
directory holds the CircleCI configuration file that define the pipeline configuration.
The django-k8-app
directory holds the Django app source files that can be used to generate Docker file.
The django-k8-deployment
directory holds the configuration files for deployment of the Django application with Kubernetes.
The eks-cluster
directory holds instructions for deploying a Kubernetes cluster on AWS EKS.
The face-restoration-ml
directory holds the code used for deployment of GFPGAN with AWS Lambda using EC2 Mounted EFS.
The .gitattributes
defines the (large) model files that are pushed and pulled using Git LFS.
Project keywords: Kubernetes
CircleCI
Docker
Docker Compose
Makefile
AWS Infrastructure as Code
Django
Django Cookiecutter
GitHub
Slack
Ansible
Tags: GFPGAN
AWS Machine Learning
AWS Lambda
AWS EFS mount EC2
Pipeline
DevOps
WARNING: Carefully review costs for all resources used before deploying. Moreover, always avoid unexpected cost by destroying all resources when finished. This project contains AWS resources with costs that are not cheap! Furthermore, the author is not responsible for the use of the information contained in or linked from this repository.
This project operationalizes a Deep Learning Microservice API. In general, it comprises three applications:
- A Django web application that allows users to submit a picture whom these wish to reconstruct.
- An inference application that processes the submitted images
- A Lambda function that operationalizes the inference application
The web application is build using Django and Django Cookiecutter. Although not part of the assignment, it is developed to deliver a usable application. Django provides development of web applications and configuration the frontend and the backend. It includes provisions for an RDS database to store account information. For this project however, no functionalities are used which require a database.
GFPGAN
is used for the face reconstruction model. Some modifications have been made to read and write from and to AWS S3 (i.e. in memory processing), and other unused features are removed. AWS EFS with Lambda is used. The EFS file system is configured for a Lambda function to import the required libraries and load the model. The AWS Pay as you go machine learning inference with AWS Lambda was used as a guidance for mounting the file system (EFS) on EC2 instance.
An Amazon Elastic File System (EFS) is mounted on an EC2 for use with a Lambda function. EFS provides a cost effective tool for using Lambda with heavy packages that require storage space to load models and other dependencies.
- Install Git LFS. Git LFS is used for committing the GFPGAN models. These models are over 100MB in size.
- Install Docker. Docker enables separating the application from your infrastructure.
- Install Docker Compose. Docker Compose is a tool for defining and running multi-container Docker applications.
- Install Hadolint. Hadolint enables linting of Docker images
wget -O /bin/hadolint https://github.com/hadolint/hadolint/releases/download/v2.8.0/hadolint-Linux-x86_64
sudo chmod +x /bin/hadolint
- Install Minikube. Minikube is local Kubernetes, focusing on making it easy to develop for Kubernetes.
- Install Kubectl. The kubectl command line tool lets you control Kubernetes clusters. For more information of Kubernetes, see this link.
- Install eksctl. The eksctl command line utility provides the fastest and easiest way to create a new cluster with nodes for Amazon EKS.
- Create a database (e.g., an AWS RDS Postgres database).
- Create a Kubernetes cluster. This repository provides instructions for deploying an AWS EKS cluster. For the instructions, refer to section Repository content. The workflow contained here assumes that the user that created the cluster is the same user that wished to connect to the cluster. When this is not the case, AWS will raise errors.
Create an EC2 Key Pair. Write down the key-pair name. You will need it later when setting up environment variables (under KEYNAME).
Copy the contents of the .pem file and use it to create an SSH key is CircleCI.
To add an SSH key navigate to Project Settings
> SSH Keys
. Under Additional SSH keys, click on Add SSH
key and paste the content of the .pem file in it.
Copy the resulting Fingerprint
.
From the CircleCI main menu navigate to Organizational Settings
. Create a context and name it capstone_env_variables
.
If you choose another name, make sure to reflect this change in the CircleCI config file.
After creating the context click on Add Environment Variable
, and create a new variable with Environment Variable Name KEYNAME
. Paste the previously acquired fingerprint in the value field.
Also add the following variables
AMI_TO_USE ****431e
AWS_ACCESS_KEY_ID ****44XE
AWS_DEFAULT_REGION ****st-2
AWS_SECRET_ACCESS_KEY ****YXD9
DJANGO_ADMIN_URL ****sZp/
DJANGO_AWS_ACCESS_KEY_ID ****44XE
DJANGO_AWS_SECRET_ACCESS_KEY ****YXD9
DJANGO_SECRET_KEY ****ldhW
DOMAIN ****.com
DOMAIN_EMAIL ****.com
EKS_CLUSTER_NAME ****tone
KEYNAME ****inja
NAMESPACE ****tone
POSTGRES_PASSWORD ****PbB5
S3_BUCKET_IN_NAME ****nput
S3_BUCKET_OUT_NAME ****tput
S3_BUCKET_SOURCE_NAME ****urce
STACK_NAME_INFRASTRUCTURE ****ture
STACK_NAME_SERVERLESS ****mbda
WWWDOMAIN ****.com
In this project the following options are available:
-
Local deployment, python test and code linting:
The workflow comprises a local run of the docker applications (using docker-compose), pyton tests, and a code linter to catch errors in the Docker files. The local test reflects the production application. The three methods are used to catch errors early in the pipeline. These workflow jobs take approximately 5 minutes to complete. They block any AWS resources from being created in case there are errors in the code. -
Application ready deployment:
This repository can be run using thesitwolf
Docker files, or developers can dockerize their own application. The images used must be entered in the accompanied env file (./capstone/django-k8-deployment/configs/.envs/.env) or directly changeimage
references in three files must be changed: The django and postgres yml files in the directory./django-k8-deployment/configs/deployments
and the traefik yml file in./capstone/django-k8-deployment/configs/ingress-p2
. When using sitwolf docker files you can use the following images:sitwolf/django_k8
,sitwolf/postgres_k8
,sitwolf/traefik_k8
. -
Deployment:
This project uses kubernetes and Docker Compose. Kubernetes is an open-source system for automating the management of containerized applications. In the context of this projects, Docker-compose provides an easy way to deploy. One can simply deploy using the commanddocker-compose -f production.yml up
. However docker-compose does not provide the scaling and replica possibilities provided by Kubernetes. To increase or decrease the number of pods per deployment changereplicas
under specs in the deployment yml file. -
Blue Green deployment:
For this deployment a blue-green deployment strategy is adopted. This is achieved using Amazon Route 53. After successful deployment run the commandkubectl get svc -n capstone
, copy the external IP (load balancer DNS) and update your DNS records accordingly. This AWS Whitepaper describes implementation techniques for updating DNS Routing with Amazon Route 53.
This section provides a brief overview of commands that can help with troubleshooting.
- Get all config resources and check if all is running. Sometimes running does not mean that the application is performing as it should (see next command).
kubectl get all -n capstone
- Get the logs. From the above command output. Copy the pod id and replace the pod id in the following command. Make sure to check replicas due to the load balancer.
kubectl logs pod/django-12345678-123qr
- Check what the events have to tell.
kubectl get events --sort-by=.metadata.creationTimestamp -n capstone
- Check if ACME is empty.
kubectl exec -n capstone --stdin --tty traefik-123456789-123nl -- /bin/sh
- Check cert for TS
kubectl get secret -o yaml
- Automatic certificate generation (Traefik), fails due to Kubernetes read-only file system. A thread was opened for this issue.
- When the above is resolved, integrate
deploy-django-k8-ingress.sh
in the script. Or modify certificate management to use user-provided certificate rather than using Traefik auto generated certificates. - Set up the
livenessProbe
andreadinessProbe
for the Django Kubernetes deployment (commented out). - Add a lambda function that destroys user uploaded content after x minutes.
- Define least privilege policies for this project (AWS IAM).