Leveraging data for 16K students, what can we say about the student experience at this university?
Can this data help us with student success?
Number of students | Number of fields |
---|---|
16,786 | 137 |
- EDA: Exploratory Data Analysis by looking at all the fields and their values
- Data Summarization:
- Features: extract features from the fields
- Dimensionality Reduction: PCA, tSNE, UMAP
- Split the data: split the dataset into Train and Test sets
- Decision Tree: modelling a Decision Tree classifier
- Random Forest: modelling a Random Forest classifier and analysing the power of their features
- Linear Models & Ablation studies: fitting a Linear Regression model and running ablation studies to measure the variance explained by the features
- Mutual Information Exploration: measuring the mutual information (mutual dependence between two variables) of each of the features we figured as important (based on the ablation study) and the precision mark.
You can always view a notebook using https://nbviewer.jupyter.org/
Scatter plot: CAO Points vs Leving Cert Math Points
Decision Tree to predict the final RESULT: