- Data has been imported from Kaggle.
- The Dependedent feature is named as 'target'
- The shape is 303 rows and 14 columns.
- Our data has balanced dataset
- We visualise the data and represent using seaborn.
- We check for each feature, the range of each feature and how can we optimise our dataset.
- One method is Removing outliers using Boxplot
- Using the correlation method, we find the feature which is least correlated and hence drop it.
- We use the package XGBClassifier from xgboost package for classification
- Hyperparameter Tuning to get the best parameters for our model.
-
Cross Validation to confirm our data doesnot overfit by using StratifiedKfold with 5 folds
-
We get a accuracy_score of 90.0% for our model
Note: Accuracy_score is given by (TP+TN)/(TP+TN+FP+FN)
TP- True Positive | TN- True Negative | FP- False Positive | FN - False Negative