How you create the data
Linda230 opened this issue · 2 comments
Your work is very excellent.
I would like to know how you create your data, for example, how the "en_fact.json" is created, I noticed that there are positive and negative samples, how these samples are created, is it created manually or just automatically.
Looking forward to receiving your reply.
The queries and answers are generated by gpt-3.5-turbo and then manually filtered and adjusted. The document is retrieved using Google api (obtain the website) and dense retriever (get the top30 passage in all websites).
In all data, the negative doucments means the doucments that do not contain the answer text and positive documents will contain the answer text.
For counterfactual robustness data, such as zh_fact.json, we manually modify the answers and replace the answer text in retrieved positive documents to construct "positive_wrong" key.
Thank you so much for your help.