Considering the rapid growth of the research of multilingual NLP, we have established this repository to gather relevant literature in this specific multilingual domain. (As a contribution of the survey paper "A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers")
This is also a tutorial of multilingual pre-trained models maintained by the Beijing Jiaotong University (BJTU) NLP Group (Continual Updated).
The past five years have witnessed the rapid development of multilingual pre-trained models, especially for data-driven large language models (LLMs). Due to the dominance of multilingual NLP at the present time, priority is given to collecting important, up-to-date multilingual pre-trained models papers and their performance. As one of the contributions of the survey, we continuously update and expand the content according to the chapters in the survey. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!
We only present an overview of representative LLMs (most of trainable parameters greater than 7B) that have certain multilingual capabilities, including their release time and details. The latest models that achieve good performance on the leaderboard will be updated in a timely manner, or contact us for updates and promotion.
We investigate the LLMs with multilingualism in our reconstructed benchmarks. (If there are many versions of a model, we only choose the version that perform the best.)
In this leaderboard we use a unified prompt for each task to explore the multilingual capabilities of the model. The potential enhancement capabilities of the model are explored in the next chapter "Multilingual Inference Strategies".
🎈 A suite for calling LLMs is coming soon! The benchmark is under built.
All models are available on the Internet. The link of paper or Github is given.
- LLaMA, Meta AI
- LLaMA-1, LLaMA: Open and Efficient Foundation Language Models, 2023.02.27
- LLaMA-2, Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023.07.18
- LLaMA-3, Meta Llama 3, 2024.04.18
- GLM, ZHIPU, ChatGLM-6B, 2023.05.13, The other versions are released at their Github as well.
- Baichuan, Baichuan AI
- Baichuan-1, Technical Report, 2023.06.15
- Baichuan-2, Baichuan 2: Open Large-scale Language Models, 2023.09.06
- Baichuan-3, Chat Platform, 2024.01.29
- Qwen, Alibaba, Qwen Technical Report
- Qwen, Technical Report, 2023.08.03
- Qwen-1.5, Technical Report, 2024.02.05
- Phi, Microsoft
- Phi-1, Textbooks Are All You Need, 2023.06.20
- Phi-1.5, Textbooks Are All You Need II: phi-1.5 technical report, 2023.09.11
- Phi-2, Phi-2: The surprising power of small language models, 2023.12.12
- Phi-3, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, 2024.04.23
- Mistral, Mistral 7B
- Mistral 7B, 2023.10.10
- Mixtral 8x7B, 2023.12.11
- Mixtral 8x22B, 2024.04.17
- OpenChat, Tsinghua University, OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, 2023.09.20
- Deepseek, DeepSeek AI, DeepSeek LLM Scaling Open-Source Language Models with Longtermism, 2024.01.05
- InternLM, Shanghai AI Laboratory
- InternLM, Repo, 2023.09.20
- InternLM2, InternLM2 Technical Report, 2024.03.26
- BLOOM, Big Science, BLOOM: A 176B-Parameter Open-Access Multilingual Language Model, 2022.11.09
- BLOOMZ-7b1, Hugging Face, Crosslingual Generalization through Multitask Finetuning, 2023.05.29
- Bayling, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS), BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models, 2023.06.19
We only investigate a few representative closed-source LLMs because most of these commercial systems are expensive to invoke. We hope to get sponsors or voluntary enterprise responses to compare closed-source systems, otherwise our goal is to discuss the future potential of LLMs within the open-source community.
- GPT, OpenAI
- ChatGPT (GPT-3.5-turbo)
- GPT-4, GPT-4 Technical Report, 2023.05.15
- PaLM, Google
- Claude, Anthropic
- Gemini, DeepMind, Chat with Gemini
We investigate several inference strategies for LLMs to explore the potential enhancement capabilities with multilingualism in the related benchmarks. (The multilingual inference strategies are to act on prompt with external knowledge, and LLMs are frozen.)
Model | Method | MGSM | XCOPA | XNLI | PAWS-X | MKQA | Avg |
---|---|---|---|---|---|---|---|
GPT-3.5 | Basic | 34.4 | 72.3 | 52.2 | 49.7 | 35.4 | 48.8 |
En-Basic | 41.1 | 76.1 | 63.0 | 62.0 | 36.5 | 55.7 | |
CoT | 49.9 | 72.4 | 50.6 | 50.3 | 35.4 | 51.7 | |
En-CoT | 61.1 | 78.6 | 56.4 | 61.6 | 42.9 | 60.1 | |
XLT | 62.3 | 79.3 | 59.2 | 59.4 | 37.6 | 59.6 | |
Trans-Google | 73.9 | 84.5 | 60.5 | 67.2 | 43.8 | 66.0 | |
Trans-NLLB | 61.0 | 79.7 | 59.2 | 67.5 | 37.2 | 60.9 | |
BLOOMZ-7b1 | Basic | 1.3 | 21.4 | 8.3 | - | 7.8 | 9.7 |
En-Basic | 2.0 | 56.5 | 43.9 | - | 10.6 | 28.3 | |
CoT | 1.2 | 20.9 | 8.2 | - | 6.5 | 9.2 | |
En-CoT | 1.7 | 53.9 | 35.9 | - | 9.3 | 25.2 | |
XLT | 1.7 | 50.5 | 35.4 | - | 8.0 | 23.9 | |
Trans-Google | 2.7 | 63.7 | 44.3 | - | 17.2 | 32.0 | |
Trans-NLLB | 2.4 | 61.8 | 43.8 | - | 14.7 | 30.7 | |
Mistral-7B-Instruct | Basic | 11.2 | 54.2 | 42.8 | 44.6 | 7.8 | 32.1 |
En-Basic | 23.8 | 34.9 | 50.2 | 46.9 | 7.0 | 32.6 | |
CoT | 17.0 | 53.8 | 43.4 | 44.3 | 7.8 | 33.3 | |
En-CoT | 27.6 | 40.8 | 50.0 | 46.6 | 11.5 | 35.3 | |
XLT | 31.8 | 61.5 | 46.0 | 47.8 | 9.6 | 39.3 | |
Trans-Google | 41.3 | 59.2 | 55.0 | 51.5 | 17.0 | 44.8 | |
Trans-NLLB | 31.7 | 54.4 | 53.0 | 52.4 | 15.5 | 41.4 | |
Llama-2-7B-Chat | Basic | 8.4 | 46.5 | 34.6 | 48.1 | 14.4 | 30.4 |
En-Basic | 9.3 | 49.7 | 39.0 | 48.8 | 16.1 | 32.6 | |
CoT | 10.9 | 46.3 | 35.6 | 48.3 | 13.3 | 30.9 | |
En-CoT | 13.6 | 54.9 | 41.0 | 48.7 | 13.8 | 34.4 | |
XLT | 10.4 | 50.8 | 44.8 | 44.5 | 14.6 | 33.0 | |
Trans-Google | 28.6 | 67.7 | 45.5 | 57.5 | 19.8 | 43.8 | |
Trans-NLLB | 24.8 | 64.6 | 44.1 | 56.2 | 17.4 | 41.4 | |
Llama-2-13B-Chat | Basic | 15.6 | 50.1 | 36.4 | 54.0 | 18.3 | 34.9 |
En-Basic | 19.0 | 54.3 | 43.4 | 59.1 | 20.2 | 39.2 | |
CoT | 18.1 | 50.9 | 35.7 | 54.8 | 15.7 | 35.0 | |
En-CoT | 19.9 | 54.5 | 43.7 | 57.6 | 19.8 | 39.1 | |
XLT | 22.3 | 56.0 | 51.4 | 55.7 | 19.0 | 40.9 | |
Trans-Google | 39.1 | 71.9 | 46.1 | 58.4 | 33.8 | 49.9 | |
Trans-NLLB | 31.8 | 68.2 | 45.4 | 57.8 | 28.4 | 46.3 | |
Llama-2-70B-Chat | Basic | 23.6 | 51.5 | 39.0 | 52.8 | 24.8 | 38.3 |
En-Basic | 28.6 | 55.3 | 46.5 | 60.4 | 24.7 | 43.1 | |
CoT | 23.5 | 50.5 | 37.9 | 54.9 | 21.9 | 37.7 | |
En-CoT | 30.2 | 61.2 | 45.9 | 64.9 | 31.1 | 46.7 | |
XLT | 32.8 | 58.7 | 52.2 | 55.7 | 26.6 | 45.2 | |
Trans-Google | 53.3 | 80.9 | 54.0 | 68.5 | 39.7 | 59.3 | |
Trans-NLLB | 43.8 | 77.1 | 52.2 | 69.2 | 19.4 | 52.3 |
*Note: This leaderboard is followed by Liu et al., we will update it in the next version when the evaluation suite for calling LLMs is built.
[Question]: "制作一件袍子需要2匹蓝色纤维布料和这个数量一半的白色纤维布料。它一共需要用掉多少匹布料"
Basic: [Query]=[Question]+[Prompt: 您的最终答案的格式应为:"答案: <阿拉伯数字>".]
En-Basic: [Query]=[Question]+[Prompt -> English Prompt: You should format your final answer as "Answer: <Arabic numeral>".]
CoT: [Query]=[Question]+[Prompt -> CoT: 让我们一步步思考。您的最终答案的格式应为:"答案: <阿拉伯数字>".]
En-CoT: [Query]=[Question]+[Prompt -> English CoT: Let’s think step by step in English. You should format your final answer as "Answer: <Arabic numeral>".]
XLT: [Query]=[Prefix: I want you to act as an arithmetic reasoning expert for Chinese.Request: ]+[Question]+[Complex Prompt: You should retell the request in English. You should do step-by-step answer to obtain a number answer. You should step-by-step answer the request. You should tell me the answer in this format "Answer:".]
Trans-X: [Query]=[Question -> English Question by X]+[Prompt -> English CoT: Let’s think step by step in English. You should format your final answer as "Answer: <Arabic numeral>".]
We provide a reading list (Continual Updated) for this chapter corresponding to the section 4 in the survey.
This leaderboard is built by the EasyJailbreak framework on the AdvBench.
Method | GPT-3.5 | GPT-4 | Llama-2-7B-Chat | Llama-2-13B-Chat | Vicuna-7B-v1.5 | Vicuna-13B-v1.5 | ChatGLM | Qwen-7B-Chat | InterLM-7B | Mistral-7B |
---|---|---|---|---|---|---|---|---|---|---|
GCG | 12% | 0% | 46% | 46% | 94% | 94% | 34% | 48% | 10% | 82% |
JailBroken | 100% | 58% | 6% | 4% | 100% | 100% | 95% | 100% | 100% | 100% |
GPTFUZZER | 35% | 0% | 31% | 41% | 93% | 94% | 85% | 82% | 92% | 99% |
AutoDAN | 45% | 2% | 51% | 72% | 100% | 97% | 89% | 99% | 98% | 98% |
DeepInception | 66% | 35% | 8% | 0% | 29% | 17% | 33% | 58% | 36% | 40% |
ICA | 0% | 1% | 0% | 0% | 51% | 81% | 54% | 36% | 23% | 75% |
PAIR | 19% | 20% | 27% | 13% | 99% | 95% | 96% | 77% | 86% | 95% |
ReNeLLM | 87% | 38% | 31% | 69% | 77% | 87% | 86% | 70% | 67% | 90% |
Multilingual | 12% | 0% | 46% | 46% | 94% | 94% | 34% | 48% | 10% | 82% |
Cipher | 100% | 58% | 6% | 4% | 100% | 100% | 95% | 100% | 100% | 100% |
CodeChameleon | 35% | 0% | 31% | 41% | 93% | 94% | 85% | 82% | 92% | 99% |
We provide a reading list of jailbreaking and defense methods (Continual Updated) for this chapter corresponding to the section 5 in the survey.
🎈 The leaderboard of legal benchmark is under built.
🎈 The leaderboard of medical benchmark is under built.
Coming soon! This domain is under updated.
The data resource and popular benchmarks are listed in the reading list in details.
Project Lead:
- Kaiyu Huang, kyhuang@bjtu.edu.cn
- Fengran Mo, fengran.mo@umontreal.ca
Section Contributors:
- Inference: Yulong Mao
- Security: Hongliang Li
- Multidomain: You Li
Special Thanks:
- Chaoqun Liu (Nanyang Technological University, Singapore) provides valuable thoughts and contributes part of the implementation of the multilingual inference strategies.
@misc{huang2024survey,
title={A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers},
author={Kaiyu Huang and Fengran Mo and Hongliang Li and You Li and Yuanchi Zhang and Weijian Yi and Yulong Mao and Jinchen Liu and Yuzhuang Xu and Jinan Xu and Jian-Yun Nie and Yang Liu},
year={2024},
eprint={2405.10936},
archivePrefix={arXiv},
primaryClass={cs.CL}
}