Fine-tuning specific areas
peng06051126 opened this issue · 3 comments
peng06051126 commented
First of all, thank you for your great contribution. I would like to fine-tune galactica in the direction of generating articles from topics. Can you provide training data samples, or do you have any suggestions?
mkardas commented
peng06051126 commented
Thank you for your reply. May I ask how the model performs on non-English data? Has there been any relevant test? And what proportion does non-English data take in the pre-training data set, such as Chinese data, etc.
mkardas commented
By design the models are not multi-lingual and most of the natural language documents in the pretraining corpus are written in English. See more in Introduction to GALACTICA Models notebook (look for "multi-lingual").