Project Description:

  1. This project aims to build and deploy a binary-classification model for classifying individuals, who either exhibit early stages of diabetes or are at risk of developing diabetes, based off some signs and symptoms. Dataset was downloaded from Kaggle and encompasses diverse information, ranging from demographic details to specific symptoms associated with diabetes: Age (1-20 to 65)-Age range of the individuals, Sex (1. Male, 0. Female)-Gender information, Polyuria (1. Yes, 0. No)-Presence of excessive urination, Polydipsia (1. Yes, 0. No)-Excessive thirst, Sudden Weight Loss (1. Yes, 0. No), Weakness (1. Yes, 0. No), Polyphagia (1. Yes, 0. No)-Excessive hunger, Presence of genital thrush(1. Yes, 0. No), Visual Blurring (1. Yes, 0. No), Itching (1. Yes, 0. No), Irritability (1. Yes, 0. No), Delayed wound healing (1. Yes, 0. No), Partial loss of voluntary movement (1. Yes, 0. No), Muscle Stiffness (1. Yes, 0. No), Alopecia (1. Yes, 0. No)-Hair loss, Obesity (1. Yes, 0. No) and Diabetes classification (1. Positive, 0. Negative).

  2. The instructions on how to execute this project is stated as follows: (a) I downloaded the Diabetes dataset from Kaggle (b) I read the dataset on jupyter notebook, cleaned the data, did data visualization with seaborn of some specific features, did exploratory data analysis and feature engineering. Thereafter, I built and trained a binary classification model (Logistic regression) with Scikit-Learn for my training data, and evaluated with the validation datasets.

  3. Additionally, I used confusion matrix and classification report from scikitlearn to get the accuracy, precision, f1 score, recall, true positive, false positive, true negative and false negative. Finally, I retrained the model with both the training and validation datasets, did cross validation with KFOLD to get the best values of regularization and number of splits, inorder to get the area under curve score(roc_auc_score). I saved and loaded the model in a pickle wb and rb files before downloading to the working directory in visual studio code.

  4. I downloaded my ipynb notebook as a python script on visual studio code, named it train.py, did some editing and created a webservice with flask (predict.py) to classify random patients with diabetic symptoms, inorder to seek medical attention for insulin or not. I ensured the model classified right with good predictions (predict-test.py).

  5. Because I use windows operating system, I downloaded windows subsystem for linux (WSL) to ensure Linux commands work on windows computer. I installed gunicorn to avoid warnings when running the flask (predict.py) webservice inorder to get predictions (predict-test.py). Example of such commands are: gunicorn --bind localhost:9696 predict:app.

  6. I created a virtual environment for my model to include all its dependencies and run in isolation from everything else on my Laptop with pipenv. All I did was: pip install pipenv, pipenv install numpy, scikit-learn==1.3.0 flask. It downloaded the packages together with the Pipfile and Pipfile.lock which contained all necessary dependencies.

  7. From docker python image on dockerhub, I got a python image with tag- python:3.10.13-slim, use this code: docker run -it --rm python:3.10.12-slim to download it while I ensured my docker desktop was up and running. I created a docker file to overwrite the downloaded python image with code : docker build -t . and ran it as a container with : docker run -it --rm --entrypoint=bash . Lastly, I ran the container with: docker run -it --rm -p 9696:9696 .

  8. I deployed my docker container on AWS elasticbeanstalk using command line interface: pipenv install awsebcli --dev, pipenv shell, eb init --help, eb init -p docker -r us-east-1 , ls -a, less .elasticbeanstalk/config.yml, eb local run --port 9696, eb create house-price-env, and terminated using eb terminate .

  9. Finally, I ensured to run the docker container with Kubernetes, firstly locally on my Laptop by creating clusters with kind before deploying to AWS Elastic Kubernetes Service. I ensured creation of deployment and service yaml files for this. The processes and codes are : * docker build -t , * docker run -it --rm -p 9696:9696 , * create a model-deployment.yaml file on vsc, type deployment and select the Kubernetes deployment that pops up, * on the yaml file, replace my app = tf-serving-clothing-model, image = docker image name and tag of model, cpu = 1, memory = 512Mi, replicas = 1 and containerPort = <9696/or any specified port>, * kind load docker-image , * kubectl apply -f model-deployment.yaml, * kubectl get deployment, * kubectl get pod, kubectl describe pod | less, * kubectl port-forward 9696:9696, * python3 predict-test.py, * create a model-service.yaml file on vsc, * type service and select the Kubernetes service that pops up, * on the yaml file, replace my app = same name of model-deployment,yaml, port=80 and targetPort=9696. under spec: write type:LoadBalancer, * kubectl apply -f model-service.yaml, * kubectl get service OR svc, * kubectl port-forward <service/outcome created above> 9696:80, * python3 predict-test.py, * create an eks-config.yaml from eksctl.io under the working directory, * eksctl create cluster -f eks-config.yaml, * aws ecr create-repository --repository-name <diabetes-risk-images / or any registry name>, * attach variables to the url from our created ECR, * $(aws ecr get-login --no-include-email), * docker push ${MODEL_REMOTE}, * take the URL of the pushed images from AWS EKS and put them in image section of model deployment configuration with this code echo ${MODEL_REMOTE}, * kubectl get nodes, * docker ps, * kubectl apply -f model-deployment.yaml, * kubectl apply -f model-service.yaml, * kubectl get pod, * kubectl get service, * kubectl port-forward service/diabetes-risk-model 9696:80, * python3 predict-test.py in new terminal, * telnet aaf59f3c33cdc4ccb970de4fd0f68401-348694010.us-east-1.elb.amazonaws.com 80, * Post the external IP DNS from AWS Load balancer in our predict-test.py file as the new URL, * python3 predict-test.py, * login to AWS, checkout created EKS to view the created cluster, EC2 instances, load Balancer, * eksctl delete cluster --name .