google-research-datasets/Taskmaster

Correct number of Dialogues in TaskMaster 4 (Coffee)?

Opened this issue · 0 comments

Hi all! In the paper, I noticed this line saying there were 6,500 TaskMaster Dialogues:

The Taskmaster Coffee dataset consists of 6,500 multi-turn conversations, consisting of 20,000
training examples (conversation turns or API calls), and 3,000 reward examples.

Loading and merging the files in the /data folder though, I only got 3,710. Am I loading these incorrectly?

Here is a snippet of how I was loading them:

DATA_URL: str = "https://github.com/google-research-datasets/Taskmaster/raw/master/TM-4-2024/data/data_0{i}.json"

if __name__ == '__main__':
    all_data = []
    for i in range(8):
        data = requests.get(DATA_URL.format(i=i)).json()
        all_data.extend(data)
    print(f"Total number of dialogues: {len(all_data)}")

Thank you! If you happen to have an apis.json and evaluation scripts for Response Generation/API Argument Prediction, that would be helpful as well, though I can also write my own.