negative documents construction for graph retriever of hotpotQA fullwiki
dsl-light opened this issue · 2 comments
Hello AkariAsai, thank you for the great job! After going through the codes of graph retriever, I found that the principle of negative documents construction for graph retriever seems: TF-IDF documents first, then the hyperlink negative ones? My question is: hyperlink negative docs are considered by appending docs of all_linked_paras_dic, but keys of all_linked_paras_dic are all TF-IDF retrieved titles, so the most important part, hyperlink negative doc of gold path, may not be included for training?
Hi, @dsl-light
Thank you for going through our code!
TF-IDF documents first, then the hyperlink negative ones?
This is totally correct, based on the logic of our code.
My question is: hyperlink negative docs are considered by appending docs of all_linked_paras_dic, but keys of all_linked_paras_dic are all TF-IDF retrieved titles, so the most important part, hyperlink negative doc of gold path, may not be included for training?
For this, let us explain the logic in detail.
-
Appending gold paragraph titles (only during the training phase)
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L495
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502 -
Appending TF-IDF-based negative examples
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502
We can control how many TF-IDF-based negative examples we use for the model training, and also please refer to https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L502
for the use of the--tfidf_limit
option. -
Appending hyperlink-based negative examples
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L526
https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/blob/master/graph_retriever/utils.py#L540
Here we add hyperlink-based negative examples, and we can see that the hyperlinked titles are used.
l
is a hyperlinked paragraph's title from a paragraphp_
(example.all_linked_paras_dic[p_]
).
Let us know if you have further quesitons.
Thank you!