How to train and evaluate the models in HotpotQA distractor setting?
solomon-han opened this issue · 2 comments
Hi, thanks for your great works!
I'm currently trying to reproduce your results in HotpotQA distractor setting, but I am facing some technical difficulties.
I apologize in advance if these are dumb questions, but it would be very helpful if you answer these:
-
'hotpot_train_order_sensitive.json' file
Readme file in graph_retriever folder specifies that 'hotpot_train_order_sensitive.json' is used for training in hotpot distractor setting. But I can't find this file in train_data folder you released. Is there any way I can download this particular file, or is there a way I can create a file of this particular format from original HotpotQA training set? -
sentence selector
I read in your paper that, graph retriever in hotpotQA distractor setting is different from full-wiki setting, but both settings share the same reader model. I'm curious if the sentence selector model is separate(like graph retriever) or shared(like reader) across distractor/full-wiki setting. Also, if the sentence selector for distractor setting is different from that of full-wiki setting, I wonder how I can get the train data for distractor setting. (It seems that the train data you released contains only one pair of dev/train data for the sentence selector) -
preprocessing of hotpot distractor dataset
It seems that, in order to run your model(for evaluation), the user needs a preprocessed dataset.
I checked that the preprocessed hotpot full-wiki data is available, but I am not sure I have access to the hotpot distractor dataset. Is there any way for me to get preprocessed hotpot distractor data? (Downloading it or Preprocessing it by myself?) -
Evaluation on distractor setting
It seems that the evaluation code for QA/SP task basically considers the open-domain scenario.
How can I evaluate the model in closed scenario in distractor setting, as you did in your paper?
Thanks for your reading and attention :) @hassyGo @AkariAsai
Hi, thank you for the questions!
- 'hotpot_train_order_sensitive.json' file
I've found some of the data files were no longer publicly visible due to the sharing setting of the company account. I'm re-uploading those files and the HotpotQA distractor data will be uploaded here. Sorry about that!
- sentence selector
I think we use the same training data for the sentence selector training (the data available here). We use the same sentence selector model for the distractor and full wiki setting. @hassyGo please feel free to add followups if you have any :)
- preprocessing of hotpot distractor dataset
It seems that, in order to run your model(for evaluation), the user needs a preprocessed dataset.
I checked that the preprocessed hotpot full-wiki data is available, but I am not sure I have access to the hotpot distractor dataset.
Not sure if we understand your questions, but what kind of preprocessed data do you need? (e.g., train data with the negative examples for the retriever/reader training)
For distractor setting, we use the publicly available data from the HotpotQA official website.
- Evaluation on distractor setting
We first feed the given 10 paragraphs to the graph retriever, retrieve reasoning paths from it, and feed the selected top reasoning paths to the reader model. We do not release a script to run the pipeline, but you can run the evaluations on distractor by the steps below:
- convert the original HotpotQA input data to the tfidf output format
- Set the
--saved_tfidf_retrieval_outputs_path
option ofeval_odqa.py
to the file path and run theeval_odqa.py
script as you do in the full wiki setting. When thesaved_tfidf_retrieval_outputs_path
is specified, our graph retriever loads the file instead of running TF-IDF based initial article retrieval.
Let me know if you have any followup questions!
Thanks, it solved my problems!
I will close the issue :) @AkariAsai