Does anyone know which 282 tasks are used?
ntubiolin opened this issue · 1 comments
Hello!
I just read the paper Scaling Instruction-Finetuned Language Models, and wonder which 282 tasks are used in this paragraph:
Second, increasing the number of finetuning tasks improves performance, although the majority of the
improvement comes from using up to 282 tasks. There are two potential explanations for the small gain after
282 tasks. One is that the additional tasks are not particularly diverse, and so they are not providing the model
with new knowledge. Another explanation is that most of the gains from multi-task instruction finetuning
come from the model learning to better express knowledge that it already knows from pretraining, and more
than 282 tasks does not help too much. This second explanation could make sense since the pre-training data
consists of 780B tokens, while instruction finetuning only uses 1.4B tokens (0.2% of the pre-training tokens).
Does anyone know the answer?
I think I find the answer: CoT, Muffin, T0-SF.
Close the issue now.