Key Error when running drQA
Closed this issue · 1 comments
When running the following command:
PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5
The new version of the code gives me a key error.
The exception comes from the line (from drqascripts/retriever/build_tfidf.py in function count)
col.extend([DOC2IDX[doc_id]] * len(counts))
I do not get this error if i set the flag --parallel to be false. Not too sure but I am guessing that the issue lies in the multiprocessing part of the code.
Thanks for your help!
Have you done a pip upgrade since updating to the latest version?
The original version of DrQA is not thread-safe. This was made thread safe by avoiding use of global variables and wrapping DrQA in a class. The updated version looks more like col.extend([self.DOC2IDX[doc_id]] * len(counts))
.
pip --upgrade -r requirements.txt
should fix you.
James