apple/ml-qrecc

Explicit Annotations of the Source of the Conversations

Closed this issue · 2 comments

Thank you for the great work, and for open-sourcing the dataset.

The paper mentions that QReCC contains 13,598 dialogues, out of which 9.3K dialogues are based on the questions from QuAC, 80 are from TREC CAsT, and 4.4K are from NQ.

Do you have the annotations of the source of conversations? Specifically, I am looking for where a conversation with a specific "Conversation_no" is from QuAC, TREC CAsT, or NQ.

According to my exploration, the first 80 conversations are from TREC CAsT, and I could match ~5k conversations from QuAC by exact match. However, I think many conversations have some corrections or are shorter than the original QuAC conversations. It will be very helpful to have an explicit mapping .

Hi - Yes, the first 80 are from TREC CAsT. We will add the source annotations soon. Thanks.

We added the conversation source information. You can use the field "Conversation_source".