Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data

Question

Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data

pk1130 opened this issue 3 years ago · 4 comments

Hey @jplalor @EntilZha! When testing all the functions written in cli.py with the scored predictions of the squad data, train and train_and_evaluate work well producing the desired training and evaluation results. But when testing the evaluate function separately by passing in the squad.jsonlines file for TEST_PAIRS_PATH, the function throws the following error on Line 144:

KeyError: item_id is not a key in the dict

Am I passing in the wrong file into TEST_PAIRS_PATH? Have also opened issue #18 to clarify what TEST_PAIRS_PATH refers to. Please respond at your earliest convenience. Thanks a lot!

Answer 1 · 2021-07-14T19:28:04.000Z

I can't debug right now, but we should probably make sure we include unit/integration tests for new features to avoid issues like this as well as give some usage examples.

@pk1130 if you have time, it would make for a great PR to debug the issue, create a unit/integration test that reproduces the issue, and create a fix for it.

Answer 2 · 2021-07-14T19:33:17.000Z

Would love to help out! I'll take a look at it when I wake up tomorrow morning :) Just to confirm @EntilZha, in place of the argument TEST_PAIRS_PATH I have to pass in the squad.jsonlines file path right?

Answer 3 · 2021-07-16T13:55:34.000Z

The data format for evaluate is slightly different than the squad example as implemented. The expectation is that you have:

A set of learned subject and item parameters
(subjectID, itemID) pairs that you want to estimate p(correct) for.

TEST_PAIRS_PATH refers to a new file of jsonlines that have the format:

{"subject_id": "s1", "item_id": "q1"}
{"subject_id": "s2", "item_id": "q1"}
{"subject_id": "s1", "item_id": "q3"}

This way you only get predictions for subject-items pairs that are specified (e.g., a previously held out test set).

Answer 4 · 2021-08-11T19:21:10.000Z

Closing this as I think we're all set.