/Heart_Disease_Prediction

Classification Problem

Primary LanguageJupyter Notebook

Heart_Disease_Prediction Classification Problem

  • Data has been imported from Kaggle.
  • The Dependedent feature is named as 'target'
  • The shape is 303 rows and 14 columns.
  • Our data has balanced dataset

Data Visualisation:

  • We visualise the data and represent using seaborn.

Data Cleaning.

  • We check for each feature, the range of each feature and how can we optimise our dataset.
  • One method is Removing outliers using Boxplot

  • Using the correlation method, we find the feature which is least correlated and hence drop it.

Data Modeling

  • We use the package XGBClassifier from xgboost package for classification
  • Hyperparameter Tuning to get the best parameters for our model.

  • Cross Validation to confirm our data doesnot overfit by using StratifiedKfold with 5 folds

  • We get a accuracy_score of 90.0% for our model

Note: Accuracy_score is given by (TP+TN)/(TP+TN+FP+FN)

TP- True Positive | TN- True Negative | FP- False Positive | FN - False Negative