Diabetes-Prediction-using-ensemble-techniques

The main aim of this study is to improve the accuracy of diabetes mellitus prediction by utilizing various machine learning techniques, including ensemble methods such as Stacking, Hard Voting, and Soft Voting, with base classifiers like AdaBoost, Logistic Regression, Random Forest, Gradient Boost, Linear Discriminant Analysis, Extra Trees, and Cat Boost. For this experimentation, we will be using the Pima Indians Diabetes dataset, which gathers details on patients with and without diabetes, to construct and evaluate each model before selecting the optimal ensemble model to address this issue. The best performing model was the ensemble model using soft voting. However, the model had a high bias and low variance, which was addressed by calibration. The final model achieved an accuracy of 93.75%, precision of 95.24%, recall of 86.96%, and an F1 score of 90.91%. This study highlights the potential of machine learning techniques for predicting diabetes and the importance of calibration to improve model performance.