A Kubernetes-Django-AWS-based-microservice web application

Project Overview

This repository provides code for deployment of an image face reconstruction web application. It showcases the development of a cloud native application build in a DevOps environment. Its deployment is configured by means of a CirclCi pipeline. This repository contains code as part of the Udacity Cloud DevOps Engineer Nanodegree Program capstone project.

Application description

Users can upload a picture they wish to reconstruct. After users upload their picture they are prompted to wait by a loading screen. In the backend, the file is renamed to a random name (randomly generated hex string) and is uploaded to an S3 bucket. The upload to the S3 bucket triggers an AWS Lambda function. The picture then runs through an inference model that attempts to properly reconstruct faces on the image (e.g. sharpen contours). The reconstructed picture is returned to the user. The deployment of the code and resources needed are described in the next sections.

INFO: Make sure to install Git LFS before cloning this repository. Failing to do so will lead to errors due to missing files. Refer to the requirements section for more information regarding Git LFS.

Repository content

The .circleci directory holds the CircleCI configuration file that define the pipeline configuration. The django-k8-app directory holds the Django app source files that can be used to generate Docker file. The django-k8-deployment directory holds the configuration files for deployment of the Django application with Kubernetes. The eks-cluster directory holds instructions for deploying a Kubernetes cluster on AWS EKS. The face-restoration-ml directory holds the code used for deployment of GFPGAN with AWS Lambda using EC2 Mounted EFS. The .gitattributes defines the (large) model files that are pushed and pulled using Git LFS.

Project keywords: Kubernetes CircleCI Docker Docker Compose Makefile AWS Infrastructure as Code Django Django Cookiecutter GitHub Slack Ansible

Tags: GFPGAN AWS Machine Learning AWS Lambda AWS EFS mount EC2 Pipeline DevOps

WARNING: Carefully review costs for all resources used before deploying. Moreover, always avoid unexpected cost by destroying all resources when finished. This project contains AWS resources with costs that are not cheap! Furthermore, the author is not responsible for the use of the information contained in or linked from this repository.

This project operationalizes a Deep Learning Microservice API. In general, it comprises three applications:

A Django web application that allows users to submit a picture whom these wish to reconstruct.
An inference application that processes the submitted images
A Lambda function that operationalizes the inference application

1. Web application

The web application is build using Django and Django Cookiecutter. Although not part of the assignment, it is developed to deliver a usable application. Django provides development of web applications and configuration the frontend and the backend. It includes provisions for an RDS database to store account information. For this project however, no functionalities are used which require a database.

2. Inference application

GFPGAN is used for the face reconstruction model. Some modifications have been made to read and write from and to AWS S3 (i.e. in memory processing), and other unused features are removed. AWS EFS with Lambda is used. The EFS file system is configured for a Lambda function to import the required libraries and load the model. The AWS Pay as you go machine learning inference with AWS Lambda was used as a guidance for mounting the file system (EFS) on EC2 instance.

3. A Lambda function(s)

An Amazon Elastic File System (EFS) is mounted on an EC2 for use with a Lambda function. EFS provides a cost effective tool for using Lambda with heavy packages that require storage space to load models and other dependencies.

Requirements

Git Large File Storage

Install Git LFS. Git LFS is used for committing the GFPGAN models. These models are over 100MB in size.

Docker

Install Docker. Docker enables separating the application from your infrastructure.

Docker Compose

Install Docker Compose. Docker Compose is a tool for defining and running multi-container Docker applications.

Hadolint

Install Hadolint. Hadolint enables linting of Docker images

wget -O /bin/hadolint https://github.com/hadolint/hadolint/releases/download/v2.8.0/hadolint-Linux-x86_64
sudo chmod +x /bin/hadolint

Minikube

Install Minikube. Minikube is local Kubernetes, focusing on making it easy to develop for Kubernetes.

Kubectl

Install Kubectl. The kubectl command line tool lets you control Kubernetes clusters. For more information of Kubernetes, see this link.

Amazon eksctl

Install eksctl. The eksctl command line utility provides the fastest and easiest way to create a new cluster with nodes for Amazon EKS.

Preparation

Database

Create a database (e.g., an AWS RDS Postgres database).
Create a Kubernetes cluster. This repository provides instructions for deploying an AWS EKS cluster. For the instructions, refer to section Repository content. The workflow contained here assumes that the user that created the cluster is the same user that wished to connect to the cluster. When this is not the case, AWS will raise errors.

Variables

Create an EC2 Key Pair. Write down the key-pair name. You will need it later when setting up environment variables (under KEYNAME). Copy the contents of the .pem file and use it to create an SSH key is CircleCI. To add an SSH key navigate to Project Settings > SSH Keys. Under Additional SSH keys, click on Add SSH key and paste the content of the .pem file in it. Copy the resulting Fingerprint.

From the CircleCI main menu navigate to Organizational Settings. Create a context and name it capstone_env_variables. If you choose another name, make sure to reflect this change in the CircleCI config file. After creating the context click on Add Environment Variable, and create a new variable with Environment Variable Name KEYNAME. Paste the previously acquired fingerprint in the value field.

Also add the following variables

AMI_TO_USE			        ****431e	
AWS_ACCESS_KEY_ID		        ****44XE	
AWS_DEFAULT_REGION		        ****st-2	
AWS_SECRET_ACCESS_KEY		        ****YXD9	
DJANGO_ADMIN_URL		        ****sZp/	
DJANGO_AWS_ACCESS_KEY_ID	        ****44XE	
DJANGO_AWS_SECRET_ACCESS_KEY	        ****YXD9	
DJANGO_SECRET_KEY		        ****ldhW	
DOMAIN				        ****.com	
DOMAIN_EMAIL			        ****.com	
EKS_CLUSTER_NAME		        ****tone	
KEYNAME				        ****inja	
NAMESPACE			        ****tone	
POSTGRES_PASSWORD		        ****PbB5	
S3_BUCKET_IN_NAME		        ****nput	
S3_BUCKET_OUT_NAME		        ****tput	
S3_BUCKET_SOURCE_NAME		        ****urce	
STACK_NAME_INFRASTRUCTURE	        ****ture	
STACK_NAME_SERVERLESS		        ****mbda	
WWWDOMAIN			        ****.com

Project options and features

In this project the following options are available:

Local deployment, python test and code linting: The workflow comprises a local run of the docker applications (using docker-compose), pyton tests, and a code linter to catch errors in the Docker files. The local test reflects the production application. The three methods are used to catch errors early in the pipeline. These workflow jobs take approximately 5 minutes to complete. They block any AWS resources from being created in case there are errors in the code.
Application ready deployment: This repository can be run using the sitwolf Docker files, or developers can dockerize their own application. The images used must be entered in the accompanied env file (./capstone/django-k8-deployment/configs/.envs/.env) or directly change image references in three files must be changed: The django and postgres yml files in the directory ./django-k8-deployment/configs/deployments and the traefik yml file in ./capstone/django-k8-deployment/configs/ingress-p2. When using sitwolf docker files you can use the following images: sitwolf/django_k8, sitwolf/postgres_k8, sitwolf/traefik_k8.
Deployment: This project uses kubernetes and Docker Compose. Kubernetes is an open-source system for automating the management of containerized applications. In the context of this projects, Docker-compose provides an easy way to deploy. One can simply deploy using the command docker-compose -f production.yml up. However docker-compose does not provide the scaling and replica possibilities provided by Kubernetes. To increase or decrease the number of pods per deployment change replicas under specs in the deployment yml file.
Blue Green deployment: For this deployment a blue-green deployment strategy is adopted. This is achieved using Amazon Route 53. After successful deployment run the command kubectl get svc -n capstone, copy the external IP (load balancer DNS) and update your DNS records accordingly. This AWS Whitepaper describes implementation techniques for updating DNS Routing with Amazon Route 53.

Troubleshooting

This section provides a brief overview of commands that can help with troubleshooting.

Get all config resources and check if all is running. Sometimes running does not mean that the application is performing as it should (see next command).

kubectl get all -n capstone

Get the logs. From the above command output. Copy the pod id and replace the pod id in the following command. Make sure to check replicas due to the load balancer.

kubectl logs pod/django-12345678-123qr

Check what the events have to tell.

kubectl get events --sort-by=.metadata.creationTimestamp -n capstone

Check if ACME is empty.

kubectl exec -n capstone --stdin --tty traefik-123456789-123nl -- /bin/sh

Check cert for TS

kubectl get secret -o yaml

TODO

Automatic certificate generation (Traefik), fails due to Kubernetes read-only file system. A thread was opened for this issue.
When the above is resolved, integrate deploy-django-k8-ingress.sh in the script. Or modify certificate management to use user-provided certificate rather than using Traefik auto generated certificates.
Set up the livenessProbe and readinessProbe for the Django Kubernetes deployment (commented out).
Add a lambda function that destroys user uploaded content after x minutes.
Define least privilege policies for this project (AWS IAM).

sitWolf/kubernetes-django-web-app