/LLMSurvey

A collection of papers and resources related to Large Language Models.

LLMSurvey

A collection of papers and resources related to Large Language Models.

The organization of papers refers to our survey "A Survey of Large Language Models".

Please let us know if you find out a mistake or have any suggestions by e-mail: batmanfly@gmail.com

(we suggest ccing another email francis_kun_zhou@163.com meanwhile, in case of any unsuccessful delivery issue.)

To facilitate the reading of our (English-verison) survey, we also employ LLMs + some human checking to generate a Chinese version for this survey. While, since it is mainly generated by LLMs, please don't forward or post its content on the Web.

If you find our survey useful for your research, please cite the following paper:

@article{LLMSurvey,
    title={A Survey of Large Language Models},
    author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
    year={2023},
    journal={arXiv preprint arXiv:2303.18223},
    url={http://arxiv.org/abs/2303.18223}
}

Table of Contents

Timeline of LLMs

LLMs_timeline

List of LLMs

Category model Release Time Size(B) Link
Publicly
Accessbile
T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pythia 2023/01 12 Paper
LLaMA 2023/02 65 Paper
Vicuna 2023/03 13 Blog
Koala 2023/04 13 Blog
Closed
Source
GShard 2020/01 600 Paper
GPT-3 2020/05 175 Paper
LaMDA 2021/05 137 Paper
HyperCLOVA 2021/06 82 Paper
Codex 2021/07 12 Paper
ERNIE 3.0 2021/07 10 Paper
Jurassic-1 2021/08 178 Paper
FLAN 2021/10 137 Paper
MT-NLG 2021/10 530 Paper
Yuan 1.0 2021/10 245 Paper
Anthropic 2021/12 52 Paper
WebGPT 2021/12 175 Paper
Gopher 2021/12 280 Paper
ERNIE 3.0 Titan 2021/12 260 Paper
GLaM 2021/12 1200 Paper
InstructGPT 2022/01 175 Paper
AlphaCode 2022/02 41 Paper
Chinchilla 2022/03 70 Paper
PaLM 2022/04 540 Paper
Cohere 2022/06 54 Homepage
YaLM 2022/06 100 Github
AlexaTM 2022/08 20 Paper
Luminous 2022/09 70 Docs
Sparrow 2022/09 70 Paper
WeLM 2022/09 10 Paper
U-PaLM 2022/10 540 Paper
Flan-PaLM 2022/10 540 Paper
Flan-U-PaLM 2022/10 540 Paper
Alpaca 2023/03 7 Blog
GPT-4 2023/3 - Paper
PanGU-Σ 2023/3 1085 Paper

Resources of LLMs

Publicly Available Models

  1. T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Checkpoint]
  2. mT5: "mT5: A massively multilingual pre-trained text-to-text transformer". Linting Xue et al. NAACL 2021. [Paper] [Checkpoint]
  3. PanGu-α: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". Wei Zeng et al. arXiv 2021. [Paper] [Checkpoint]
  4. CPM-2: "CPM-2: Large-scale Cost-effective Pre-trained Language Models". Zhengyan Zhang et al. arXiv 2021. [Paper] [Checkpoint]
  5. T0: "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  6. GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Sid Black et al. arXiv 2022. [Paper] [Checkpoint]
  7. CodeGen: "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. arXiv 2022. [Paper] [Checkpoint]
  8. Tk-Instruct: "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Checkpoint]
  9. UL2: "UL2: Unifying Language Learning Paradigms". Yi Tay et al. arXiv 2022. [Paper] [Checkpoint]
  10. OPT: "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [Paper] [Checkpoint]
  11. NLLB: "No Language Left Behind: Scaling Human-Centered Machine Translation". NLLB Team. arXiv 2022. [Paper] [Checkpoint]
  12. BLOOM: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". BigScience Workshop. arXiv 2022. [Paper] [Checkpoint]
  13. GLM: "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [Paper] [Checkpoint]
  14. Flan-T5: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Checkpoint]
  15. mT0 && BLOOMZ: "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Checkpoint]
  16. Galactica: "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [Paper] [Checkpoint]
  17. OPT-IML: "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan et al. . arXiv 2022. [Paper] [Checkpoint]
  18. Pythia: "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling". Stella Biderman et al. . arXiv 2023. [Paper] [Checkpoint]
  19. LLaMA: "LLaMA: Open and Efficient Foundation Language Models". Hugo Touvron et al. arXiv 2023. [Paper] [Checkpoint]

Closed-source Models

  1. GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". Dmitry Lepikhin et al. ICLR 2021. [Paper]
  2. GPT-3: "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
  3. LaMDA: "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2021. [Paper]
  4. HyperCLOVA: "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". Boseop Kim et al. EMNLP 2021. [Paper]
  5. CodeX: "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [Paper]
  6. ERNIE 3.0: "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". Yu Sun et al. arXiv 2021. [Paper]
  7. Jurassic-1: "Jurassic-1: Technical details and evaluation". Opher Lieber et al. 2021. [Paper]
  8. FLAN: "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2021. [Paper]
  9. MT-NLG: "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". Shaden Smith et al. arXiv 2021. [Paper]
  10. Yuan 1.0: "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". Shaohua Wu et al. arXiv 2021. [Paper]
  11. Anthropic: "A General Language Assistant as a Laboratory for Alignment" . Amanda Askell et al. arXiv 2021. [Paper]
  12. WebGPT: "WebGPT: Browser-assisted question-answering with human feedback" . Reiichiro Nakano et al. arXiv 2021. [Paper]
  13. Gopher: "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [Paper]
  14. ERNIE 3.0 Titan: "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". *Shuohuan Wang et al. *arXiv 2021. [Paper]
  15. GLaM: "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". Nan Du et al. ICML 2022. [Paper]
  16. InstructGPT: "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  17. AlphaCode: "Competition-Level Code Generation with AlphaCode". Yujia Li et al. arXiv 2022. [Paper]
  18. Chinchilla: "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv. [Paper]
  19. PaLM: "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [Paper]
  20. AlexaTM: "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. [Paper]
  21. Sparrow: "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. . arXiv 2022. [Paper]
  22. WeLM: "WeLM: A Well-Read Pre-trained Language Model for Chinese". Hui Su et al. . arXiv 2022. [Paper]
  23. U-PaLM: "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [Paper]
  24. Flan-PaLM && Flan-U-PaLM: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv. [Paper]
  25. GPT-4: "GPT-4 Technical Report". OpenAI. arXiv 2023. [Paper]
  26. PanGu-Σ: "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". Xiaozhe Ren et al. arXiv 2023. [Paper]

Commonly Used Corpora

  1. BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
  2. Guntenburg: [Source]
  3. CommonCrawl: [Source]
  4. C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
  5. CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
  6. CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
  7. REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
  8. OpenWebText: [Source]
  9. Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
  10. Wikipedia: [Source]
  11. BigQuery: [Source]
  12. The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
  13. ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

Library Resource

  1. Transformers: "Transformers: State-of-the-Art Natural Language Processing". Thomas Wolf et al. EMNLP 2020. [Paper] [Source]
  2. DeepSpeed: "Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters". Rasley et al. KDD 2020. [Paper] [Source]
  3. Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [Paper] [Source]
  4. JAX: [Source]
  5. Colossal-AI: "Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training". Zhengda Bian et al. arXiv 2021. [Paper] [Source]
  6. BMTrain: [Source]
  7. FastMoE: "FastMoE: A Fast Mixture-of-Expert Training System". Jiaao He et al. arXiv 2021. [Paper] [Source]

Deep Learning Frameworks

  1. Pytorch: "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Adam Paszke el al. NeurIPS 2019. [Paper] [Source]
  2. TensorFlow: "TensorFlow: A system for large-scale machine learning". Martín Abadi et al. OSDI 2016. [Paper] [Source]
  3. MXNet: "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems". Tianqi Chen et al. arXiv 2015. [Paper] [Source]
  4. PaddlePaddle: "PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice" . Yanjun Ma et al. Frontiers of Data and Domputing 2019. [Paper] [Source]
  5. MindSpore: "Huawei MindSpore AI Development Framework" . Huawei Technologies Co., Ltd. Artificial Intelligence Technology 2022. [Paper] [Source]
  6. OneFlow: "OneFlow: Redesign the Distributed Deep Learning Framework from Scratch" . Jinhui Yuan et al. arXiv 2021. [Paper] [Source]

Pre-training

Data Collection

  1. "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]
  2. "Deduplicating Training Data Makes Language Models Better". Katherine Lee et al. ACL 2022. [paper]
  3. "Deduplicating Training Data Mitigates Privacy Risks in Language Models". Nikhil Kandpal et al. ICML 2022. [paper]
  4. "Scaling Laws and Interpretability of Learning from Repeated Data". Danny Hernandez et al. arXiv 2022. [paper]

Architecture

Mainstream Architectures

Causal Decoder

  1. "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [paper]
  2. "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [paper]
  3. "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". Teven Le Scao et al. arXiv 2022. [paper]
  4. "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv 2022. [paper]
  5. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [paper]
  6. "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [paper]
  7. "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [paper]
  8. "Jurassic-1: Technical Details and Evaluation". Opher Lieber et al. AI21 Labs. [paper]
  9. "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2022. [paper]

Prefix Decoder

  1. "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [paper]
  2. "GLM: General Language Model Pretraining with Autoregressive Blank Infilling". Zhengxiao Du et al. ACL 2022. [paper]
  3. "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [paper]

MoE

  1. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". William Fedus et al. JMLR. [paper]
  2. "Unified Scaling Laws for Routed Language Models". Aidan Clark et al. ICML 2022. [paper]

SSM

  1. "Pretraining Without Attention". Junxiong Wang et al. arXiv 2022. [paper]
  2. "Efficiently Modeling Long Sequences with Structured State Spaces". Albert Gu et al. ICLR 2022. [paper]
  3. "Long Range Language Modeling via Gated State Spaces". Harsh Mehta et al. arXiv 2022. [paper]

Detailed Configuration

Layer Normalization

  1. "DeepNet: Scaling Transformers to 1,000 Layers". Hongyu Wang et al. arXiv 2022. [paper]
  2. "Root Mean Square Layer Normalization". Biao Zhang et al. NeurIPS 2019. [paper]

Position Encoding

  1. "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". Ofir Press et al. ICLR 2022. [paper]
  2. "RoFormer: Enhanced Transformer with Rotary Position Embedding". Jianlin Su et al. arXiv 2021. [paper]

Analysis

  1. "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?". Thomas Wang et al. ICML 2022. [paper]
  2. "What Language Model to Train if You Have One Million GPU Hours?". Teven Le Scao et al. Findings of EMNLP 2022. [paper]
  3. "Examining Scaling and Transfer of Language Model Architectures for Machine Translation". Biao Zhang et al. ICML 2022. [paper]
  4. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?". Yi Tay et al. arXiv 2022. [paper]
  5. "Do Transformer Modifications Transfer Across Implementations and Applications?". Sharan Narang et al. EMNLP 2021. [paper]

Training Algorithms

  1. "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [paper]
  2. "An Efficient 2D Method for Training Super-Large Deep Learning Models". Qifan Xu et al. arXiv 2021. [paper]
  3. "Tesseract: Parallelize the Tensor Parallelism Efficiently". Boxiang Wang et al. ICPP 2022. [paper]
  4. "Maximizing Parallelism in Distributed Training for Huge Neural Networks". Zhengda Bian et al. arXiv 2021. [paper]
  5. "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". Yanping Huang et al. NeurIPS 2019. [paper]
  6. "PipeDream: Fast and Efficient Pipeline Parallel DNN Training". Aaron Harlap et al. arXiv 2018. [paper]
  7. "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models". Samyam Rajbhandari et al. SC 2020. [paper]
  8. "ZeRO-Offload: Democratizing Billion-Scale Model Training". Jie Ren et al. USENIX 2021. [paper]

Pre-training on Code

LLMs for Program Synthesis

  1. "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [paper]
  2. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [paper]
  3. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell Nye et al. arXiv 2021. [paper]
  4. "A Systematic Evaluation of Large Language Models of Code". Frank F. Xu et al. arXiv 2022. [paper]
  5. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science. [paper]
  6. "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. ICLR 2023. [paper]
  7. "InCoder: A Generative Model for Code Infilling and Synthesis". Daniel Fried et al. ICLR 2023. [paper]
  8. "CodeT: Code Generation with Generated Tests". Bei Chen et al. ICLR 2023. [paper]

NLP Tasks Formatted as Code

  1. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [paper]
  2. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [paper]

Adaptation Tuning

Instruction Tuning

  1. "Multi-Task Deep Neural Networks for Natural Language Understanding". Xiaodong Liu et al. ACL 2019. [Paper] [Homepage]
  2. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2020. [Paper] [Checkpoint]
  3. "Muppet: Massive Multi-task Representations with Pre-Finetuning". Armen Aghajanyan et al. EMNLP 2021. [Paper] [Checkpoint]
  4. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper] [Collection]
  5. "CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP". Qinyuan Ye et al. EMNLP 2021. [Paper] [Collection]
  6. "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2022. [Paper] [Homepage]
  7. "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  8. "ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning". Vamsi Aribandi et al. ICLR 2022. [Paper]
  9. "UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models". Tianbao Xie et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  10. "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". Stephen H. Bach et al. ACL 2022. [Paper] [Collection]
  11. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  12. "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  13. "MVP: Multi-task Supervised Pre-training for Natural Language Generation". Tianyi Tang et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  14. "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  15. "Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization". Yuxian Gu et al. EMNLP 2022. [Paper] [Homepage]
  16. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Homepage]
  17. "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor". Or Honovich et al. arXiv 2022. [Paper] [Homepage]
  18. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper] [Homepage]
  19. "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan Iyer et al. arXiv 2022. [Paper] [Checkpoint]
  20. "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning". Shayne Longpre et al. arXiv 2023. [Paper] [Homepage]
  21. "Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning". Renze Lou et al. arXiv 2023. [Paper]

Alignment Tuning

  1. "TAMER: Training an Agent Manually via Evaluative Reinforcement". W. Bradley Knox et al. ICDL 2008. [Paper]
  2. "Interactive Learning from Policy-Dependent Human Feedback". James MacGlashan et al. ICML 2017. [Paper]
  3. "Deep Reinforcement Learning from Human Preferences". Paul Christiano et al. NIPS 2017. [Paper]
  4. "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Garrett Warnell et al. AAAI 2018. [Paper]
  5. "Fine-Tuning Language Models from Human Preferences". Daniel M. Ziegler et al. arXiv 2019. [Paper]
  6. "Learning to summarize from human feedback". Nisan Stiennon et al. NeurIPS 2020. [Paper]
  7. "Alignment of Language Agents". Zachary Kenton et al. arXiv 2021. [Paper]
  8. "Recursively Summarizing Books with Human Feedback". Jeff Wu et al. arXiv 2021. [Paper]
  9. "A General Language Assistant as a Laboratory for Alignment". Amanda Askell et al. arXiv 2021. [Paper]
  10. "WebGPT: Browser-assisted question-answering with human feedback". Reiichiro Nakano et al. arXiv 2021. [Paper]
  11. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  12. "Teaching language models to support answers with verified quotes". Jacob Menick et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning". Deborah Cohen et al. arXiv 2022. [Paper]
  15. "Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned". Deep Ganguli et al. arXiv 2022. [Paper]
  16. "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. arXiv 2022. [Paper]
  17. "Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization". Rajkumar Ramamurthy et al. arXiv 2022. [Paper]
  18. "Scaling Laws for Reward Model Overoptimization". Leo Gao et al. arXiv 2022. [Paper]
  19. "The Wisdom of Hindsight Makes Language Models Better Instruction Followers". Tianjun Zhang et al. arXiv 2023. [Paper]

Utilization

  1. "An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels". Taylor Sorensen et al. ACL 2022. [Paper]
  2. "What Makes Good In-Context Examples for GPT-3?". Jiachang Liu et al. ACL 2022. [Paper]
  3. "Learning to retrieve prompts for in-context learning". Ohad Rubin et al. NAACL 2022. [Paper]
  4. "Diverse demonstrations improve in-context compositional generalization". Itay Levy et al. arxiv 2022. [Paper]
  5. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arxiv 2022. [Paper]
  6. "Demystifying Prompts in Language Models via Perplexity Estimation". Hila Gonen et al. arxiv 2022. [Paper]
  7. "Active Example Selection for In-Context Learning". Yiming Zhang et al. EMNLP 2022. [Paper]
  8. "Self-adaptive In-context Learning". Zhiyong Wu et al. arxiv 2022. [Paper]
  9. "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Yao Lu et al. ACL 2022. [Paper]
  10. "Structured Prompting: Scaling In-Context Learning to 1,000 Examples". Hao, Yaru et al. arxiv 2022. [Paper]
  11. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arxiv 2022. [Paper]
  12. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper]
  13. "Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner". Hyunsoo Cho et al. arxiv 2022. [Paper]
  14. "Self-instruct: Aligning language model with self generated instructions". Yizhong Wang et al. arxiv 2022. [Paper]
  15. "An Explanation of In-context Learning as Implicit Bayesian Inference". Sang Michael Xie et al. ICLR 2022. [Paper]
  16. "Calibrate Before Use: Improving Few-Shot Performance of Language Models". Zihao Zhao et al. ICML 2021. [Paper]
  17. "Data distributional properties drive emergent in-context learning in transformers". Stephanie C. Y. Chan et al. arxiv 2022. [Paper]
  18. "Emergent Abilities of Large Language Models". Jason Wei et al. arxiv 2022. [Paper]
  19. "In-context Learning and Induction Heads". Catherine Olsson et al. arxiv 2022. [Paper]
  20. "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
  21. "On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model". Seongjin Shin et al. NAACL 2022. [Paper]
  22. "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?". Sewon Min et al. EMNLP 2022. [Paper]
  23. "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale". Hritik Bansal et al. arxiv 2022. [Paper]
  24. "Transformers as algorithms: Generalization and implicit model selection in in-context learning". Yingcong Li et al. arxiv 2023. [Paper]
  25. "Transformers learn in-context by gradient descent". Johannes von Oswald et al. arxiv 2022. [Paper]
  26. "What learning algorithm is in-context learning? investigations with linear models". Ekin Aky{"{u}}rek et al. arxiv 2022. [Paper]
  27. "Chain of Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. arxiv 2022. [Paper]
  28. "STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning". Zelikman et al. arxiv 2022. [Paper]
  29. "Large language models are zero-shot reasoners". Takeshi Kojima et al. arxiv 2022. [Paper]
  30. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arxiv. [Paper]
  31. "Complexity-Based Prompting for Multi-Step Reasoning". Yao Fu et al. arxiv 2022. [Paper]
  32. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. arxiv 2022. [Paper]
  33. "Rationale-Augmented Ensembles in Language Models". Xuezhi Wang et al. arxiv 2022. [Paper]
  34. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". Denny Zhou et al. arxiv 2022. [Paper]
  35. "Multimodal Chain-of-Thought Reasoning in Language Models". Zhuosheng Zhang et al. arxiv 2023. [Paper]
  36. "Self-Consistency Improves Chain of Thought Reasoning in Language Models". Xuezhi Wang et al. arxiv 2022. [Paper]
  37. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arxiv 2022. [Paper]
  38. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arxiv 2021. [Paper]
  39. "On the Advance of Making Language Models Better Reasoners". Yifei Li et al. arxiv 2022. [Paper]
  40. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arxiv 2022. [Paper]
  41. "Teaching small language models to reason". Lucie Charlotte Magister et al. arxiv 2022. [Paper]
  42. "Large language models are reasoning teachers". Namgyu Ho et al. arxiv 2022. [Paper]
  43. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arxiv 2022. [Paper]
  44. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arxiv 2022. [Paper]
  45. "Solving Quantitative Reasoning Problems with Language Models". Aitor Lewkowycz et al. arxiv 2022. [Paper]
  46. "Text and patterns: For effective chain of thought, it takes two to tango". Aman Madaan et al. arxiv 2022. [Paper]
  47. "Challenging BIG-Bench tasks and whether chain-of-thought can solve them". Mirac Suzgun et al. arxiv 2022. [Paper]
  48. "A Survey for In-context Learning". Qingxiu Dong et al. arxiv 2023. [Paper]
  49. "Reasoning with Language Model Prompting: A Survey". Shuofei Qiao et al. arxiv 2022. [Paper]
  50. "Towards Reasoning in Large Language Models: A Survey". Jie Huang et al. arxiv 2022. [Paper]
  51. "Reward Design with Language Models". Minae Kwon et al. arxiv 2023. [Paper]
  52. "Promptagator: Few-shot Dense Retrieval From 8 Examples". Zhuyun Dai et al. arxiv 2022. [Paper]
  53. "On the Feasibility of Specialized Ability Stealing for Large Language Code Models". Zongjie Li et al. arxiv 2023. [Paper]
  54. "MathPrompter: Mathematical Reasoning using Large Language Models". Imani, Shima et al. arxiv 2023. [Paper]
  55. "ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction". Jiabang He et al. arxiv 2023. [Paper]
  56. "Selective Annotation Makes Language Models Better Few-Shot Learners". Hongjin Su et al. arxiv 2022. [Paper]

Capacity Evaluation

  1. "Measuring Massive Multitask Language Understanding". Dan Hendrycks et al. ICLR 2021. [Paper]
  2. "Persistent Anti-Muslim Bias in Large Language Models". Abubakar Abid et al. AIES 2021. [Paper]
  3. "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". Alex Tamkin et al. arXiv 2021. [Paper]
  4. "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments". Sanjana Srivastava et al. CoRL 2021. [Paper]
  5. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [Paper]
  6. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
  7. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell I. Nye et al. arXiv 2021. [Paper]
  8. "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents". Wenlong Huang et al. ICML 2022. [Paper]
  9. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. NeurIPS 2022. [Paper]
  10. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  11. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science 2022. [Paper]
  12. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances". Michael Ahn et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [Paper]
  15. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". Aarohi Srivastava et al. arXiv 2022. [Paper]
  16. "Exploring Length Generalization in Large Language Models". Cem Anil et al. NeurIPS 2022. [Paper]
  17. "Few-shot Learning with Retrieval Augmented Language Models". Gautier Izacard et al. arXiv 2022. [Paper]
  18. "Limitations of Language Models in Arithmetic and Symbolic Induction". Jing Qian et al. arXiv 2022. [Paper]
  19. "Code as Policies: Language Model Programs for Embodied Control". Jacky Liang et al. arXiv 2022. [Paper]
  20. "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models". Ishika Singh et al. arXiv 2022. [Paper]
  21. "Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans". John J. Nay et al. arXiv 2022. [Paper]
  22. "Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought". Abulhair Saparov et al. ICLR 2023. [Paper]
  23. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. ICLR 2023. [Paper]
  24. "Re3: Generating Longer Stories With Recursive Reprompting and Revision". Kevin Yang et al. EMNLP 2022. [Paper]
  25. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [Paper]
  26. "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them". Mirac Suzgun et al. arXiv 2022. [Paper]
  27. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
  28. "Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs". Albert Q. Jiang et al. ICLR 2023. [Paper]
  29. "Holistic Evaluation of Language Models". Percy Liang et al. arXiv 2022. [Paper]
  30. "PAL: Program-aided Language Models". Luyu Gao et al. arXiv 2022. [Paper]
  31. "Legal Prompt Engineering for Multilingual Legal Judgement Prediction". Dietrich Trautmann et al. arXiv 2022. [Paper]
  32. "How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment". Aidan Gilson et al. medRxiv 2022. [Paper]
  33. "ChatGPT: The End of Online Exam Integrity?". Teo Susnjak et al. arXiv 2022. [Paper]
  34. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
  35. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper]
  36. "ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports". Katharina Jeblick et al. arXiv 2022. [Paper]
  37. "The End of Programming". Matt Welsh et al. ACM 2023. [Paper]
  38. "Chatgpt goes to law school". Choi Jonathan H et al. SSRN 2023. [Paper]
  39. "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Biyang Guo et al. arXiv 2023. [Paper]
  40. "Is ChatGPT A Good Translator? A Preliminary Study". Wenxiang Jiao et al. arXiv 2023. [Paper]
  41. "Could an Artificial-Intelligence agent pass an introductory physics course?". Gerd Kortemeyer et al. arXiv 2023. [Paper]
  42. "Mathematical Capabilities of ChatGPT". Simon Frieder et al. arXiv 2023. [Paper]
  43. "Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models". Zhihong Shao et al. arXiv 2023. [Paper]
  44. "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning". Thomas Carta et al. arXiv 2023. [Paper]
  45. "Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making". Arya Yao et al. medRxiv 2023. [Paper]
  46. "Theory of Mind May Have Spontaneously Emerged in Large Language Models". Michal Kosinski et al. arXiv 2023. [Paper]
  47. "A Categorical Archive of ChatGPT Failures". Ali Borji et al. arXiv 2023. [Paper]
  48. "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity". Yejin Bang et al. arXiv 2023. [Paper]
  49. "Toolformer: Language Models Can Teach Themselves to Use Tools". Timo Schick et al. arXiv 2023. [Paper]
  50. "Is ChatGPT a General-Purpose Natural Language Processing Task Solver?". Chengwei Qin et al. arXiv 2023. [Paper]
  51. "How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation". Hendy Amr et al. arXiv 2023. [Paper]
  52. "Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT". Qihuang Zhong et al. arXiv 2023. [Paper]
  53. "Zero-Shot Information Extraction via Chatting with ChatGPT". Xiang Wei et al. arXiv 2023. [Paper]
  54. "ChatGPT: Jack of all trades, master of none". Jan Kocon et al. arXiv 2023. [Paper]
  55. "On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective". Jindong Wang et al. arXiv 2023. [Paper]
  56. "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". Baolin Peng et al. arXiv 2023. [Paper]
  57. "An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)". Paulo Shakarian et al. arXiv 2023. [Paper]
  58. "How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks". Chen Xuanting et al. arXiv 2023. [Paper]
  59. "The utility of ChatGPT for cancer treatment information". Shen Chen et al. medRxiv 2023. [Paper]
  60. "Can ChatGPT Assess Human Personalities? A General Evaluation Framework". Haocong Rao et al. arXiv 2023. [Paper]
  61. "Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT.". Mostafa M. Amin et al. arXiv 2023. [Paper]
  62. "Exploring the Feasibility of ChatGPT for Event Extraction.". Jun Gao et al. arXiv 2023. [Paper]
  63. "Does Synthetic Data Generation of LLMs Help Clinical Text Mining?". Tang Ruixiang et al. arXiv 2023. [Paper]
  64. "Consistency Analysis of ChatGPT". Myeongjun Jang et al. arXiv 2023. [Paper]
  65. "Self-planning Code Generation with Large Language Model". Shun Zhang et al. ICLR 2023. [Paper]
  66. "Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions". Yiming Tan et al. arXiv 2023. [Paper]
  67. "GPT-4 Technical Report". OpenAI et al. OpenAI 2023. [Paper]
  68. "A Short Survey of Viewing Large Language Models in Legal Aspect". Zhongxiang Sun et al. arXiv 2023. [Paper]
  69. "ChatGPT Participates in a Computer Science Exam". Sebastian Bordt et al. arXiv 2023. [Paper]
  70. "A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models". Junjie Ye et al. arXiv 2023. [Paper]
  71. "On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree?". Kamil Malinka et al. arXiv 2023. [Paper]
  72. "Sparks of Artificial General Intelligence: Early experiments with GPT-4". S'ebastien Bubeck et al. arXiv 2023. [Paper]
  73. "Is ChatGPT A Good Keyphrase Generator? A Preliminary Study". Mingyang Song et al. arXiv 2023. [Paper]
  74. "Capabilities of GPT-4 on Medical Challenge Problems". Harsha Nori et al. arXiv 2023. [Paper]
  75. "Can we trust the evaluation on ChatGPT?". Rachith Aiyappa et al. arXiv 2023. [Paper]
  76. "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks". Fabrizio Gilardi et al. arXiv 2023. [Paper]
  77. "Evaluation of ChatGPT for NLP-based Mental Health Applications". Bishal Lamichhane et al. arXiv 2023. [Paper]
  78. "ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models". Bian Ning et al. arXiv 2023. [Paper]
  79. "Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams". Desnes Nunes et al. arXiv 2023. [Paper]
  80. "Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure". Philipp Koralus et al. arXiv 2023. [Paper]
  81. "Yes but.. Can ChatGPT Identify Entities in Historical Documents?". Carlos-Emiliano González-Gallardo et al. arXiv 2023. [Paper]

The Team

Here is the list of our student contributors in each section.

Section Student Contributors
The whole paper Kun Zhou, Junyi Li
Overview && Resources of LLMs Yingqian Min (Lead), Chen Yang
Pretraining Yupeng Hou (Lead), Junjie Zhang, Zican Dong, Yushuo Chen
Adaptaion Tuning Tianyi Tang (Lead), Jinhao Jiang, Ruiyang Ren, Zikang Liu, Peiyu Liu
Utilization Xiaolei Wang (Lead), Yifan Du, Xinyu Tang
Capacity Evaluation Beichen Zhang (Lead), Zhipeng Chen, Yifan Li

Acknowledgments

The authors would like to thank Yankai Lin and Yutao Zhu for proofreading this paper. Since the first release of this paper, we have received a number of valuable comments from the readers. We sincerely thank the readers who have written to us with constructive suggestions and comments: Tyler Suard, Damai Dai, Liang Ding, Stella Biderman, Kevin Gray, and Jay Alammar.

Update Log

Version Time Update Content
V1 2023/03/31 The initial version.
V2 2023/04/09 Add the affiliation information.
Revise Figure 1 and Table 1 and clarify the
corresponding selection criterion for LLMs.
Improve the writing.
Correct some minor errors.
V3 2023/04/11 Correct the errors for library resources.
V4 2023/04/12 Revise Figure 1 and Table 1, and clarify the release date of LLMs