To execute the project follow the steps below: NOTE: Pycharm is used as IDE:
- Clone the repository
- Set up a virtual environment in the project's root directory
- Open a terminal in the project directory and run:
- python3 -m virtualenv venv
- source venv/bin/activate
- pip3 install -r requirements.txt
- pip install kaggle
- Get your API key and token from your Kaggle account (This will download a file kaggle.json in your computer)
- Ensure that your python binaries are on your path by making sure that the kaggle.json is in the user/user_name/.kaggle directory
- Now search the desired dataset using this command:
kaggle datasets list -s [KEYWORD]
- Download the desired dataset using this command:
kaggle datasets download -d [DATASET]
- Now you will have a zip file of the dataset, unzip it and your dataset is ready to be used.
- Explore the data for possible issues and pre-processing steps
- Issue found- Imbalanced datasets, number of cases without diabetes are around 92% & only 8% cases are diabetes.
- Used Scikit learn's SMOTE library to balance the data.
- Checked the correlation and removed unwanted features
- Used Scikit-Learn's randomForest classification algorithm to detect diabetes
- Instead of predicting 0(no-diabetes) 1(diabetes), using probability of getting diabetes is provided in the prediction.
- Validation results provide a precision, recall and accuracy of around 0.94. (Add visualizations)
- Save the validated model for prediction
- Make a UI using streamlit library that takes input features & the model to predict diabetes probability
- Added result rendering along with explanation of each feature to be input
- Install docker in the system
- Create and run a docker file in the repository to containerize the streamlit app.
- The Dockerfile has set of commands that will be executed sequentially to build a docker image for the streamlit application and using this image Docker will create a container for your application
- Once the Dockerfile is made, run the command
docker image build -t streamlit-app .
- Launch an EC2 instance and Generate key pair and get the Public DNS address
- Install docker in ec2 instance
- Install git on ec2 instance
- Clone the repo & change to the directory
- Build the docker image of streamlit app:
sudo docker build -t streamlit-app .
- Run the Docker container: Start a Docker container using the built image, and map the desired port using:
docker container run -p 8501:8501 -d streamlit-app
- Use the following command:
sudo docker run -P streamlit-app
This will give you details of the https address where you can find your app.
- Start the EC2 instance- click on connect
- Copy the ssh and paste it into the project terminal & change to the project directory using cd diabetes_pred_app
- Start a Docker container using the built image, and map the desired port using:
docker container run -p 8501:8501 -d streamlit-app
(If successful this will give you an image code) - To get the app link run :
sudo docker run -P streamlit-app