There are too many errors in the dataset！

Question

There are too many errors in the dataset！

CalmCapK opened this issue 10 months ago · 8 comments

Your dataset is probably not manually confirmed, especially the dataset for the noise robustness experiment and the information integration experiment. More than half of the data is wrong:

The negative reference document contains positive information;
Positive reference document contains label but is not necessarily related to the question;
When the script processes data, negative data will be used when postivate is insufficient, so the noise ratio loses its meaning;
.....
Please carefully confirm the accuracy of your data.

Answer 1 · 2024-03-05T12:17:55.000Z

As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.

Answer 2 · 2024-03-10T18:02:58.000Z

As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.

@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?

Answer 3 · 2024-03-11T02:02:04.000Z

As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.

@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?

It can be finished in this month, and I will release the new data as soon as possible (maybe this week or next week).

Answer 4 · 2024-03-11T12:37:09.000Z

As mentioned in our paper, we did not manually review retrieval documents. However, we noticed several issues about the quality of retrieval documents. Therefore, we are currently conducting manual and ChatGPT checks on the retrieval document and will update the dataset soon.

@chen700564 Can you provide some estimates of how much time do you need to conduct all the checks?

It can be finished in this month, and I will release the new data as soon as possible (maybe this week or next week).

Wonderful, thank you!

Answer 5 · 2024-03-22T14:35:57.000Z

Is it confirmed that it has been updated?
Has this data been cleaned?

Answer 6 · 2024-03-22T15:26:40.000Z

Is it confirmed that it has been updated? Has this data been cleaned?

Yes, most of the positive documents and some of negative documents are updated

Answer 7 · 2024-04-02T05:41:13.000Z

I see that there are two refined new files updated. Thanks for your work!

However, it looks like the data files for the information integration task still have some errors, including fake positive news, queries without supporting materials, etc.

Do you have plans to update those files as well? Thx!

Answer 8 · 2024-04-03T01:33:22.000Z

I see that there are two refined new files updated. Thanks for your work!

However, it looks like the data files for the information integration task still have some errors, including fake positive news, queries without supporting materials, etc.

Do you have plans to update those files as well? Thx!

While we do not have immediate plans to update en_int and zh_int, we will provide notification in the readme once the updates are completed (potentially within a few months).