/learner-performance-prediction

Simple and performant implementations of learner performance prediction algorithms.

Primary LanguagePython

Learner Performance Prediction

Simple and performant implementations of learner performance prediction algorithms:

Setup

Create a new conda environment, install PyTorch and the remaining requirements:

conda create python==3.7 -n learner-performance-prediction
conda activate learner-performance-prediction
pip install -r requirements.txt
conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch

The code supports the following datasets:

Dataset # Users # Items # Skills # Interactions Mean # skills/item Timestamps Median length
assistments09 3,241 17,709 124 278,868 1.20 No 35
assistments12 29,018 53,086 265 2,711,602 1.00 Yes 49
assistments15 14,567 100 100 658,887 1.00 No 20
assistments17 1,708 3,162 102 942,814 1.23 Yes 441
bridge_algebra06 1,146 129,263 493 1,817,476 1.01 Yes 1,362
algebra05 574 173,113 112 607,025 1.36 Yes 574
spanish 182 409 221 578,726 1.00 No 1,924
statics 282 1,223 98 189,297 1.00 No 635

For your convenience, the preprocessed data sets are in the data/ folder. You do NOT need to preprocess data sets yourself.

If you want to reproduce the preprocessing, download the data from one of the links above and:

  • place the main file under data/<dataset codename>/data.csv for an ASSISTments dataset
  • place the main file under data/<dataset codename>/data.txt for a KDDCup dataset
  • place the two data files under data/<dataset codename>/{filename} for the Spanish dataset
python prepare_data.py --dataset <dataset codename> --remove_nan_skills

Training

Logistic Regression

To encode a sparse feature matrix with specified features:

  • Item Response Theory (IRT): -i
  • PFA: -s -sc -w -a
  • DAS3H: -i -s -sc -w -a -tw
  • Best logistic regression features (Best-LR): -i -s -ic -sc -tc -w -a
python encode.py --dataset <dataset codename> <feature flags>

To train a logistic regression model with a sparse feature matrix encoded through encode.py:

python train_lr.py --X_file data/<dataset codename>/X-<feature suffix>.npz --dataset <dataset codename>

Deep Knowledge Tracing

To train a DKT model:

python train_dkt2.py --dataset <dataset codename> 

Self-Attentive Knowledge Tracing

To train a SAKT model:

python train_sakt.py --dataset <dataset codename>

Results (AUC)

Algorithm assist09 assist12 assist15 assist17 bridge06 algebra05 spanish statics
IRT 0.69 0.71 0.64 0.68 0.75 0.77 0.68 0.79
PFA 0.72 0.67 0.69 0.62 0.77 0.76 0.85 0.69
DAS3H - 0.74 - 0.69 0.79 0.83 - -
Best-LR 0.77 0.75 0.70 0.71 0.80 0.83 0.86 0.82
DKT 0.75 0.77 0.73 0.77 0.79 0.82 0.83 0.83
SAKT 0.75 0.73 0.73 0.72 0.78 0.80 0.83 0.81