VT-NLP/Mocheg

error in train(args) of retrieve_train.py script

Closed this issue · 2 comments

Hi and thank you for promoting Mocheg. I'm facing an error in train(args) function when I'm trying to reproduce the evidence retrieval part. The problem is, in the train set, there are some DOCUMENT# in text_evidence_qrels_sentence_level.csv file which are not presents in Corpus3_sentence_level.csv. so consequently there is no text for them which causes to "KeyError: '12843-67364-1". where "12843-67364-1" is the DOCUMENT#. please note that I'm using the complete version of Corpus3_sentence_level.csv which also includes tweets.

Thanks for your reminder. I cleaned some noisy documents in Corpus3_sentence_level.csv while missing the text_evidence_qrels_sentence_level.csv file. I will update text_evidence_qrels_sentence_level.csv by Sunday.

Currently, you can first remove these rows in text_evidence_qrels_sentence_level.csv if their DOCUMENT# does not exist in Corpus3_sentence_level.csv. The "DOCUMENT#" in text_evidence_qrels_sentence_level.csv should match corpus_id in Corpus3_sentence_level.csv. Thanks!

already updated the dataset files.