A python project that allows easy management of knowledge tracing data
- Easy conversion between commonly used data format
- Data cleaning
- Convert non-binary correctnesses into binary correctness
- Remove (csv) rows where critical values are missing
- Group non-grouped data by students to obtain attempt sequences per student
- Filter data
- By maximum attempt count (by splitting or cutting)
- By minimum attempt count
- Split into train and test set
- Split into kfold train and test sets
- Print data statistics in different formats
Contains student attempt sequences in row triples that contain:
- Number of attempts
- Skill ids of attempts
- Attempt correctnesses
Example contents for two students:
3
1,2,3
0,0,1
2
1,1
0,1
Python >3.6
conda install scikit-learn pandas
or (use pip3 if pip points to python 2)
pip install scikit-learn pandas
From csv to asc format using default column names.
python converter.py my.csv my.asc --out-format asc
From to yudelson-bkt (hmm-scalable) format with specified column names
python converter.py my.csv my.tsv --out-format yudelson-bkt --user-col student-id --exercise-col problem-id --skill-col problem-id --correct-col is-correct
Show all options
python converter.py -h
python -m unittest discover -s tests