There are too many errors in the dataset!
CalmCapK opened this issue · 8 comments
Your dataset is probably not manually confirmed, especially the dataset for the noise robustness experiment and the information integration experiment. More than half of the data is wrong:
- The negative reference document contains positive information;
- Positive reference document contains label but is not necessarily related to the question;
- When the script processes data, negative data will be used when postivate is insufficient, so the noise ratio loses its meaning;
.....
Please carefully confirm the accuracy of your data.
As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.
As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.
@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?
As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.
@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?
It can be finished in this month, and I will release the new data as soon as possible (maybe this week or next week).
As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.
@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?
It can be finished in this month, and I will release the new data as soon as possible (maybe this week or next week).
Wonderful, thank you!
Is it confirmed that it has been updated?
Has this data been cleaned?
Is it confirmed that it has been updated? Has this data been cleaned?
Yes, most of the positive documents and some of negative documents are updated
I see that there are two refined new files updated. Thanks for your work!
However, it looks like the data files for the information integration task still have some errors, including fake positive news, queries without supporting materials, etc.
Do you have plans to update those files as well? Thx!
I see that there are two refined new files updated. Thanks for your work!
However, it looks like the data files for the information integration task still have some errors, including fake positive news, queries without supporting materials, etc.
Do you have plans to update those files as well? Thx!
While we do not have immediate plans to update en_int and zh_int, we will provide notification in the readme once the updates are completed (potentially within a few months).