LeMei/UniMSE

How would this apply to the Chinese dataset

IKIUJK opened this issue · 2 comments

Hello, I am very interested in this project.
1)When using the T5, do you initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training? Or freezing the parameters of T5 all the time?
2)If I want to apply it to Chinese dataset, can I use mT5 instead of T5 for training?
3)And how long did the training take in the paper?
Thank you very much!

LeMei commented

Hello, I am very interested in this project. 1)When using the T5, do you initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training? Or freezing the parameters of T5 all the time? 2)If I want to apply it to Chinese dataset, can I use mT5 instead of T5 for training? 3)And how long did the training take in the paper? Thank you very much!

Thanks for your attention!

Yes, we initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training. we fine-tune the parameters of several Transformer layers in the training.
I suggest you to check the architecture of mT5 and I may not be able to give you an answer.
Almost two days, and I don't remember the exact time.

Hello, I am very interested in this project. 1)When using the T5, do you initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training? Or freezing the parameters of T5 all the time? 2)If I want to apply it to Chinese dataset, can I use mT5 instead of T5 for training? 3)And how long did the training take in the paper? Thank you very much!

Thanks for your attention!

Yes, we initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training. we fine-tune the parameters of several Transformer layers in the training. I suggest you to check the architecture of mT5 and I may not be able to give you an answer. Almost two days, and I don't remember the exact time.

Thanks for your answer. Maybe I can try it.