evaluation: consider a version of evaluation that uses Item Response Theory
jeisner opened this issue · 0 comments
jeisner commented
Possibly run a comparative evaluation that gives more credit for test questions where most systems got the answer wrong. (Item response theory is a statistical method that tries to figure out a student's skills based on their test answers.)