/edm-dcu

Analysis on 16K student data

Primary LanguageJupyter Notebook

Analysis on Student Data

Leveraging data for 16K students, what can we say about the student experience at this university?

Can this data help us with student success?

Data

Number of students Number of fields
16,786 137

List of fields

Notebooks

  1. EDA: Exploratory Data Analysis by looking at all the fields and their values
  2. Data Summarization:
  3. Features: extract features from the fields
  4. Dimensionality Reduction: PCA, tSNE, UMAP
  5. Split the data: split the dataset into Train and Test sets
  6. Decision Tree: modelling a Decision Tree classifier
  7. Random Forest: modelling a Random Forest classifier and analysing the power of their features
  8. Linear Models & Ablation studies: fitting a Linear Regression model and running ablation studies to measure the variance explained by the features
  9. Mutual Information Exploration: measuring the mutual information (mutual dependence between two variables) of each of the features we figured as important (based on the ablation study) and the precision mark.

Tutorials

You can always view a notebook using https://nbviewer.jupyter.org/

Figures

EDA: Exploring CAO Points

Correlations:

Scatter plot: CAO Points vs Leving Cert Math Points

Decision Tree to predict the final RESULT:

Random Forest: Top 10 Most Important Features

Linear Model: Ablation Study by Excluding Features