Datasets: Datasets from Wei2022 repository (aqua, asdiv, commonsenseqa, date_understanding, gsm, mapwps sports_understanding, strategy_qa, svamp)
matthias-samwald opened this issue · 7 comments
Note that we are already creating StrategyQA data from the original source file. Need to check how CoT-relevant data in the source dataset was used by Wei et al.
I've looked into asdiv, svamp and mawps (They all basically have the same structure) and I think we can include them.
Wei manually crafted (the same) prompts (CoTs) for all three datasets:
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah's sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent 5 * $3 = $15 on the bagels. She had $23 in beginning, so now she has $23 - $15 = $8. The answer is 8.
Q: Seven red apples and two green apples are in the basket. How many apples are in the basket?
A:
However I think we can automatically create CoTs from this. This is a basic example of a simple addition:
{
"Question": "number0 red apples and number1 green apples are in the basket . how many apples are in the basket ?",
"Numbers": "7 2",
"Equation": "+ number0 number1",
"Answer": 9,
"group_nums": [0, 1, 2, 3, 4, 5, 6, 10, 16, 17, 18],
"Grade": 1,
"Type": "Addition",
"Body": "number0 red apples and number1 green apples are in the basket .",
"Ques_Statement": "how many apples are in the basket ?"
}
We could parse the equation field and then automatically generate CoTs like "We start with the numbers 7 and 2. The answer has to be the sum of both numbers. The answer is 7 + 2 = 9." or something like that. Unfortunately there is no field for unit or something like that (red apples, green apples) so this is somewhat generic and we can only use "number".
More (and more complicated) examples f.e. here: https://github.com/arkilpatel/SVAMP/blob/main/data/cv_asdiv-a/fold0/dev.csv
CommonsenseQA, date_understanding and sports_understanding do not have explanations.
Observations for asdiv :
- The formatting of the question text could be improved through post-processing (there are many superfluous spaces; capitalization). I guess this was carried over from the source?
Observations for asdiv, svamp and mawps:
- For CoTs with a single line, the "First, [...] prefix could be removed.
The superfluous space is tokenization. I don't know if it is too good if we detokenize the text and then have to tokenize it again. But on the other hand, having uniform (all datasets not tokenized) datasets would be very good.
I think we should have "natural" text, each NLP model will potentially have different in-built tokenizers and we should not present pre-tokenized input to models. Models were pre-trained on natural/unprocessed text, so the inputs should closely resemble that.