facebookresearch/atlas

Poor quality of outputs from large model on 1/10th of wikipedia

ArthurConmy opened this issue · 0 comments

Hey! When setting up the large model quickly, with 1/10th of corpora/wiki/enwiki-dec2018 (and otherwise default settings), the quality of outputs is very low:

question: who got the first nobel prize in physics answer: <extra_id_0> <pad><extra_id_0> mr. </s>
question: when is the next deadpool movie being released answer: <extra_id_0> <pad><extra_id_0> november 2020</s>
question: which mode is used for short wave broadcast service answer: <extra_id_0> <pad><extra_id_0> sms</s>
question: the south west wind blows across nigeria between answer: <extra_id_0> <pad><extra_id_0> a. </s>
question: what does hp mean in war and order answer: <extra_id_0> <pad><extra_id_0> hp </s>

in the first example, the answer is far too short ("mr. ") and manual inspection shows that the Wikipedia articles retrieved included Nobel prize winners in physics. Any idea what I'm doing wrong?