Knowledge tracing data fiddler

A python project that allows easy management of knowledge tracing data

Easy conversion between commonly used data format
Data cleaning
- Convert non-binary correctnesses into binary correctness
- Remove (csv) rows where critical values are missing
Group non-grouped data by students to obtain attempt sequences per student
Filter data
- By maximum attempt count (by splitting or cutting)
- By minimum attempt count
Split into train and test set
Split into kfold train and test sets
Print data statistics in different formats

Attempts-skills-corrects (asc) format

Contains student attempt sequences in row triples that contain:

Number of attempts
Skill ids of attempts
Attempt correctnesses

Example contents for two students:

3
1,2,3
0,0,1
2
1,1
0,1

Requirements

Python >3.6

conda install scikit-learn pandas

or (use pip3 if pip points to python 2)

pip install scikit-learn pandas

Usage

From csv to asc format using default column names.

python converter.py my.csv my.asc --out-format asc

From to yudelson-bkt (hmm-scalable) format with specified column names

python converter.py my.csv my.tsv --out-format yudelson-bkt --user-col student-id --exercise-col problem-id --skill-col problem-id --correct-col is-correct

Show all options

python converter.py -h

Running tests

python -m unittest discover -s tests

sjsarsa/kt-data-fiddler

Knowledge tracing data fiddler

Attempts-skills-corrects (asc) format

Requirements

Usage

Running tests