Imported from Theopilee's https://github.com/theophilee/learner-performance-prediction
Simple and performant implementations of learner performance prediction algorithms:
- Performance Factors Analysis (PFA)
- DAS3H
- Deep Knowledge Tracing (DKT)
- Self-Attentive Knowledge Tracing (SAKT)
Create a new conda environment, install PyTorch and the remaining requirements:
conda create python==3.7 -n learner-performance-prediction
conda activate learner-performance-prediction
pip install -r requirements.txt
conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch
The code supports the following datasets:
- ASSISTments 2009-2010 (assistments09)
- ASSISTments 2012-2013 (assistments12)
- ASSISTments 2015 (assistments15)
- ASSISTments Challenge 2017 (assistments17)
- Bridge to Algebra 2006-2007 (bridge_algebra06)
- Algebra I 2005-2006 (algebra05)
- Spanish (spanish)
- Statics (statics)
Dataset | # Users | # Items | # Skills | # Interactions | Mean # skills/item | Timestamps | Median length |
---|---|---|---|---|---|---|---|
assistments09 | 3,241 | 17,709 | 124 | 278,868 | 1.20 | No | 35 |
assistments12 | 29,018 | 53,086 | 265 | 2,711,602 | 1.00 | Yes | 49 |
assistments15 | 14,567 | 100 | 100 | 658,887 | 1.00 | No | 20 |
assistments17 | 1,708 | 3,162 | 102 | 942,814 | 1.23 | Yes | 441 |
bridge_algebra06 | 1,146 | 129,263 | 493 | 1,817,476 | 1.01 | Yes | 1,362 |
algebra05 | 574 | 173,113 | 112 | 607,025 | 1.36 | Yes | 574 |
spanish | 182 | 409 | 221 | 578,726 | 1.00 | No | 1,924 |
statics | 282 | 1,223 | 98 | 189,297 | 1.00 | No | 635 |
For your convenience, the preprocessed data sets are in the data/
folder. You do NOT need to preprocess data sets yourself.
If you want to reproduce the preprocessing, download the data from one of the links above and:
- place the main file under
data/<dataset codename>/data.csv
for an ASSISTments dataset - place the main file under
data/<dataset codename>/data.txt
for a KDDCup dataset - place the two data files under
data/<dataset codename>/{filename}
for the Spanish dataset
python prepare_data.py --dataset <dataset codename> --remove_nan_skills
To encode a sparse feature matrix with specified features:
- Item Response Theory (IRT):
-i
- PFA:
-s -sc -w -a
- DAS3H:
-i -s -sc -w -a -tw
- Best logistic regression features (Best-LR):
-i -s -ic -sc -tc -w -a
python encode.py --dataset <dataset codename> <feature flags>
To train a logistic regression model with a sparse feature matrix encoded through encode.py:
python train_lr.py --X_file data/<dataset codename>/X-<feature suffix>.npz --dataset <dataset codename>
To train a DKT model:
python train_dkt2.py --dataset <dataset codename>
To train a SAKT model:
python train_sakt.py --dataset <dataset codename>
Algorithm | assist09 | assist12 | assist15 | assist17 | bridge06 | algebra05 | spanish | statics |
---|---|---|---|---|---|---|---|---|
IRT | 0.69 | 0.71 | 0.64 | 0.68 | 0.75 | 0.77 | 0.68 | 0.79 |
PFA | 0.72 | 0.67 | 0.69 | 0.62 | 0.77 | 0.76 | 0.85 | 0.69 |
DAS3H | - | 0.74 | - | 0.69 | 0.79 | 0.83 | - | - |
Best-LR | 0.77 | 0.75 | 0.70 | 0.71 | 0.80 | 0.83 | 0.86 | 0.82 |
DKT | 0.75 | 0.77 | 0.73 | 0.77 | 0.79 | 0.82 | 0.83 | 0.83 |
SAKT | 0.75 | 0.73 | 0.73 | 0.72 | 0.78 | 0.80 | 0.83 | 0.81 |