Training multilingual models for sentiment analysis: by fine-tuning pre-trained multilingual language models such as multiBERT, XLMRoberta, etc. on the sentiment prediction task and comparing the performance of such multilingual models with (1) monolingual English model trained on English tweets and test on Arabic tweets translated to English (with pre-trained machine translation or Google Translate) and (2) multilingual model trained on English tweets and test on Arabic tweets (so, zero-shot classification). Do multilingual models benefit from being trained on multilingual data?
alt113/CS505-Spring-MultiLingual-Twitter-Classification
Training multilingual models for sentiment analysis: by fine-tuning pre-trained multilingual language models such as multiBERT, XLMRoberta, etc. on the sentiment prediction task and comparing the performance of such multilingual models with (1) monolingual English model trained on English tweets and test on Arabic tweets translated to English (with pre-trained machine translation or Google Translate) and (2) multilingual model trained on English tweets and test on Arabic tweets (so, zero-shot classification). Do multilingual models benefit from being trained on multilingual data?
Jupyter NotebookMIT