/Predictive-Analysis-of-Student-Outcomes-in-Higher-Education

This project is a comprehensive data analysis endeavor aimed at uncovering the key factors influencing student dropout and completion rates in higher education. Using a blend of Python and R, the project delves into the complexities of educational data, offering insights into student success and retention.

Primary LanguageJupyter NotebookMIT LicenseMIT

Predictive Analysis of Student Outcomes in Higher Education

Overview

This project focuses on identifying predictive factors linked with student dropout or completion in higher education. Using a dataset from an undergraduate institution, the analysis employs various data analysis and machine learning techniques to uncover insights into student success and retention rates.

Dataset Description

The dataset includes demographic information, application and enrollment details, parental information, student financial and special needs status, and academic performance data. Key variables include gender, nationality, course selection, financial status, and academic history.

Research Question

What are the predictive factors linked with student dropout or completion in higher education?

Methodology

  1. Data Preparation and Cleaning: Initial exploration, renaming columns, handling missing values, removing duplicates and unnecessary values.
  2. Exploratory Data Analysis (EDA): Visualizing relationships between various features and the target variable (dropout or graduate).
  3. Dimensionality Reduction: Employing PCA to simplify the model while retaining significant variance.
  4. Clustering Analysis: Applying hierarchical and K-means clustering to explore data structure.
  5. Machine Learning Prediction: Implementing a decision tree algorithm to predict student outcomes with performance evaluation metrics like accuracy, Kappa, sensitivity, specificity, and predictive values.

Tools and Technologies

  • Programming Languages: Python, R
  • Libraries: Skim, Validator, Corrplot, Princomp (for PCA), and various clustering packages
  • Machine Learning Techniques: Decision Tree, PCA, K-means Clustering

Key Findings

  • Certain demographic and financial factors significantly influence student outcomes.
  • The decision tree model provided moderately accurate predictions, indicating potential areas for educational strategy enhancement.

Instructions for Use

  1. Clone the Repository: Clone this repository to get the dataset and code.
  2. Install Dependencies: Ensure Python and R with necessary libraries are installed.
  3. Run the Analysis: Execute the code scripts to replicate the analysis. Modify parameters or methods as needed for further exploration.
  4. Interpret Results: Review the generated output and visualizations for insights.

Contribution and Acknowledgments

Thank you for the contribution of my professors and collegues.