diabetes_pred_app

To execute the project follow the steps below: NOTE: Pycharm is used as IDE:

Clone the repository
Set up a virtual environment in the project's root directory
Open a terminal in the project directory and run:
1. python3 -m virtualenv venv
2. source venv/bin/activate
3. pip3 install -r requirements.txt

pip install kaggle
Get your API key and token from your Kaggle account (This will download a file kaggle.json in your computer)
Ensure that your python binaries are on your path by making sure that the kaggle.json is in the user/user_name/.kaggle directory
Now search the desired dataset using this command: kaggle datasets list -s [KEYWORD]
Download the desired dataset using this command: kaggle datasets download -d [DATASET]
Now you will have a zip file of the dataset, unzip it and your dataset is ready to be used.

Explore the data for possible issues and pre-processing steps
Issue found- Imbalanced datasets, number of cases without diabetes are around 92% & only 8% cases are diabetes.
Used Scikit learn's SMOTE library to balance the data.
Checked the correlation and removed unwanted features

Used Scikit-Learn's randomForest classification algorithm to detect diabetes
Instead of predicting 0(no-diabetes) 1(diabetes), using probability of getting diabetes is provided in the prediction.
Validation results provide a precision, recall and accuracy of around 0.94. (Add visualizations)
Save the validated model for prediction

Make a UI using streamlit library that takes input features & the model to predict diabetes probability
Added result rendering along with explanation of each feature to be input

Install docker in the system
Create and run a docker file in the repository to containerize the streamlit app.
The Dockerfile has set of commands that will be executed sequentially to build a docker image for the streamlit application and using this image Docker will create a container for your application
Once the Dockerfile is made, run the command docker image build -t streamlit-app .

Launch an EC2 instance and Generate key pair and get the Public DNS address
Install docker in ec2 instance
Install git on ec2 instance
Clone the repo & change to the directory
Build the docker image of streamlit app: sudo docker build -t streamlit-app .
Run the Docker container: Start a Docker container using the built image, and map the desired port using:docker container run -p 8501:8501 -d streamlit-app
Use the following command: sudo docker run -P streamlit-app This will give you details of the https address where you can find your app.

Start the EC2 instance- click on connect
Copy the ssh and paste it into the project terminal & change to the project directory using cd diabetes_pred_app
Start a Docker container using the built image, and map the desired port using:docker container run -p 8501:8501 -d streamlit-app (If successful this will give you an image code)
To get the app link run : sudo docker run -P streamlit-app

farazrahman/diabetes_pred_app