sheffieldnlp/naacl2018-fever

Key Error when running drQA

Closed this issue · 1 comments

When running the following command:

PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5

The new version of the code gives me a key error.

The exception comes from the line (from drqascripts/retriever/build_tfidf.py in function count)
col.extend([DOC2IDX[doc_id]] * len(counts))

I do not get this error if i set the flag --parallel to be false. Not too sure but I am guessing that the issue lies in the multiprocessing part of the code.

Thanks for your help!

j6mes commented

Have you done a pip upgrade since updating to the latest version?

The original version of DrQA is not thread-safe. This was made thread safe by avoiding use of global variables and wrapping DrQA in a class. The updated version looks more like col.extend([self.DOC2IDX[doc_id]] * len(counts)).

pip --upgrade -r requirements.txt should fix you.

James