Heart.disease.application.video.1.1.mp4
A brief description of what this project does and who it's for
With a plethora of medical data available and the rise of Data Science, a host of startups are taking up the challenge of attempting to create indicators for the forseen diseases that might be contracted! Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Heart failure is a common event caused by CVDs. People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help. In this way, we try to solve automate another problem that occurs in the nature with a view to counter it and focus on to the next problem with the help of AI techniques!
To classify / predict whether a patient is prone to heart failure depending on multiple attributes. It is a binary classification with multiple numerical and categorical features.
- Age : age of the patient [years]
- Sex : sex of the patient [M: Male, F: Female]
- ChestPainType : chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
- RestingBP : resting blood pressure [mm Hg]
- Cholesterol : serum cholesterol [mm/dl]
- FastingBS : fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
- RestingECG : resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
- MaxHR : maximum heart rate achieved [Numeric value between 60 and 202]
- ExerciseAngina : exercise-induced angina [Y: Yes, N: No]
- Oldpeak : oldpeak = ST [Numeric value measured in depression]
- ST_Slope : the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
- HeartDisease : output class [1: heart disease, 0: Normal]
I utilized Python libraries such as os, urllib.request, and zipfile for efficient data handling. The process starts by downloading the dataset from a specified URL using urllib.request.urlretrieve. If the file doesn't exist locally, it's downloaded. I logged relevant information using a custom logger. After downloading, the data is extracted using zipfile.ZipFile into a designated directory. Finally, I split the dataset into training and testing sets using train_test_split from scikit-learn and saved them as CSV files.
For model validation, I created a DataValidation class. The validate_all_columns method checks whether all the columns in the dataset match those specified in the schema defined in the configuration file. If the validation fails, it logs the status as false; otherwise, it logs true. The status is written to a text file for reference.
The data transformation stage involves preprocessing the dataset for model training. I employed various techniques such as ordinal encoding for categorical features and feature scaling using MinMaxScaler and StandardScaler. These transformations were encapsulated within a pipeline for streamlined processing. After transformation, the modified datasets were saved as CSV files.
Feature selection aims to improve model efficiency by selecting the most relevant features. I dropped certain columns ('RestingBP' and 'RestingECG') from the dataset as part of feature selection. The modified datasets were then saved for further processing.
In the model training phase, I loaded the preprocessed data and utilized scikit-learn pipelines for seamless integration of preprocessing and model training. I used logistic regression as the classification algorithm. The model was trained on the training data and evaluated using accuracy, cross-validation score, and ROC-AUC score. The trained model was saved using joblib.dump.
In the model evaluation stage, I loaded the test data and the trained model. I evaluated the model's performance using various metrics such as confusion matrix and classification report. Visualization techniques like heatmap were employed for better understanding. The evaluation results were logged for further analysis.
- Update config.yam1
- Update schema.yaml
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the pipeline
- Update the main.py
- Update the app.py
Clone the repository
https://github.com/mahendra867/End-to-End-Heart-Disease-Application-.git
conda create -n mlproj python=3.8 -y
conda activate mlproj
pip install -r requirements.txt
# Finally run the following command
python app.py
Now,
open up you local host and port
- mlflow ui
MLFLOW_TRACKING_URI=https://dagshub.com/mahendra867/ProjectML_with_MLFlow.mlflow
MLFLOW_TRACKING_USERNAME=mahendra867
MLFLOW_TRACKING_PASSWORD=85969b2c9b582440861229562a757d53c3cbb020
python script.py
Run this to export as env variables:
export MLFLOW_TRACKING_URI=https://dagshub.com/mahendra867/ProjectML_with_MLFlow.mlflow
export MLFLOW_TRACKING_USERNAME=mahendra867
export MLFLOW_TRACKING_PASSWORD=85969b2c9b582440861229562a757d53c3cbb020
#with specific access
1. EC2 access : It is virtual machine
2. ECR: Elastic Container registry to save your docker image in aws
#Description: About the deployment
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
#Policy:
1. AmazonEC2ContainerRegistryFullAccess
2. AmazonEC2FullAccess
- Save the URI: 683781347713.dkr.ecr.us-east-1.amazonaws.com/projml
#optinal
sudo apt-get update -y
sudo apt-get upgrade
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting>actions>runner>new self hosted runner> choose os> then run command one by one
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = us-east-1
AWS_ECR_LOGIN_URI = demo>> 566373416292.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY_NAME = simple-app
MLflow
- Its Production Grade
- Trace all of your expriements
- Logging & tagging your model