This project focuses on identifying predictive factors linked with student dropout or completion in higher education. Using a dataset from an undergraduate institution, the analysis employs various data analysis and machine learning techniques to uncover insights into student success and retention rates.
The dataset includes demographic information, application and enrollment details, parental information, student financial and special needs status, and academic performance data. Key variables include gender, nationality, course selection, financial status, and academic history.
What are the predictive factors linked with student dropout or completion in higher education?
- Data Preparation and Cleaning: Initial exploration, renaming columns, handling missing values, removing duplicates and unnecessary values.
- Exploratory Data Analysis (EDA): Visualizing relationships between various features and the target variable (dropout or graduate).
- Dimensionality Reduction: Employing PCA to simplify the model while retaining significant variance.
- Clustering Analysis: Applying hierarchical and K-means clustering to explore data structure.
- Machine Learning Prediction: Implementing a decision tree algorithm to predict student outcomes with performance evaluation metrics like accuracy, Kappa, sensitivity, specificity, and predictive values.
- Programming Languages: Python, R
- Libraries: Skim, Validator, Corrplot, Princomp (for PCA), and various clustering packages
- Machine Learning Techniques: Decision Tree, PCA, K-means Clustering
- Certain demographic and financial factors significantly influence student outcomes.
- The decision tree model provided moderately accurate predictions, indicating potential areas for educational strategy enhancement.
- Clone the Repository: Clone this repository to get the dataset and code.
- Install Dependencies: Ensure Python and R with necessary libraries are installed.
- Run the Analysis: Execute the code scripts to replicate the analysis. Modify parameters or methods as needed for further exploration.
- Interpret Results: Review the generated output and visualizations for insights.
Thank you for the contribution of my professors and collegues.