Question about the number of multi-task learning T0?
StevenTang1998 opened this issue · 2 comments
StevenTang1998 commented
Hi, I also want to conduct multi-task learning on T5 using serveral datasets, and I notice you use a batch size of 1024 to fine-tune T5. So I want to ask how many steps did you fine-tune it during multi-task learning?
VictorSanh commented
Hi @StevenTang1998 ,
All the results we reported are for checkpoints fine-tuned for 12'000 steps from the T5+LM checkpoints.
We initially fine-tuned for 25'000 steps and performed checkpoint selection based on the training sets.
12'000 was the value we got for T0, and we straightforwardly transposed this value to other models.
StevenTang1998 commented
Hi, @VictorSanh , thanks for your answering!