How many of the first TF-IDF processing needs to be retained?
ditingdapeng opened this issue · 5 comments
Hello! I would like to ask how many tf-idf need to be kept at the beginning. Is it fixed?
Thank you!
Hi @ditingdapeng, thanks for your interest in our work!
Sorry, I'm not sure about which issue (or email?) you are mentioning... Would you give me more information about the how many tf-idf need to be kept at the beginning
? Is it about the document filtering process in the inference time or the number of negative examples during training?
Thank you for your reply. What I want to express is: In your paper, the first jump from the question to the relevant facts is calculated by the tf-idf method. So when using tf-idf to sort supporting documents, how many paragraphs are selected last as the initial nodes of multi-hop? Hope i can express my problem clearly
Thanks for the clarification!
For our best models, we set the initial retrieval number (F
in the paper) to 500, 100, and 100 paragraphs for HotpotQA full wiki, SQuAD Open, and Natural Questions Open, respectively ("Implementation details" section in our paper).
Please see the detailed discussion on the relationship between the number of the initial TF-IDF and performance in Section C.1 & Figure 5 in Appendix.
Thank you for your kind reply!
You're welcome! Feel free to start another issue or reach me via email if you have follow-up questions.