Introduction to Data Science Course, supervised by Professor Anthony Christidis
November 2023
Welcome to the repository of the project "Development of a Diabetes Diagnosis Algorithm". This project was developed as part of the Introduction to Data Science course under the supervision of Professor Anthony Christidis. The primary goal of this project is to create a predictive model for diagnosing diabetes using random forest algorithms.
- Random Forest Algorithms: Implemented to develop a robust predictive model for diabetes diagnosis.
- Variable Selection: Conducted analysis to identify and select important variables affecting the diagnosis.
- PCA (Principal Component Analysis): Applied PCA to reduce the dimensionality of the dataset, enhancing the model's performance.
- Hyperparameters Explored: Focused on finding optimal values for
mtry
(number of variables randomly sampled as candidates at each split) andmin_n
(minimum size of terminal nodes) to improve model accuracy.
- Decision Tree Models: Visualized decision trees to identify key variables significantly impacting the outcomes of diabetes diagnosis.
- Data Source: Utilized a diabetes dataset from a reputable medical data repository.
- Preprocessing: Cleaned and preprocessed the data to ensure quality inputs for the model.
- Random Forest Implementation: Developed the predictive model using random forest algorithms.
- Variable Selection and PCA: Selected important variables and applied PCA for dimensionality reduction.
- Optimal Values for
mtry
andmin_n
: Explored various hyperparameters to find the optimal values for improving model performance.
- Decision Tree Visualization: Visualized decision trees to identify and understand key variables affecting the diagnosis.
This project is licensed under the MIT License - see the LICENSE file for details.
- Professor Anthony Christidis for his guidance and supervision.
- The Introduction to Data Science course for the opportunity to develop this project.