Milestone Papers (credit from Github)
The wonderful visualization below from this [survey paper](https://arxiv.org/pdf/2304.13712.pdf) that summarizes the evolutionary tree of modern LLMs traces the development of language models in recent years and highlights some of the most well-known models:
-
Pretraining data
RedPajama, 2023. Repo
The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Arxiv 2020. Paper
How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
Scaling laws for neural language models, 2020. Paper
Data-centric artificial intelligence: A survey, 2023. Paper
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022. Blog -
Awesome ChatGPT Prompts: a collection of prompt examples to be used with the ChatGPT model.
-
Prompt-Learning
(2020-12) Making Pre-trained Language Models Better Few-shot Learners paper
(2021-07) Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing paper -
Instruction-Tuning-Papers: a trend starts from
Natrural-Instruction
(ACL 2022),FLAN
(ICLR 2022) andT0
(ICLR 2022). -
Chain-of-Thought: a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning.
(2021-01) Chain of Thought Prompting Elicits Reasoning in Large Language Models. paper
Category | model | Release Time | Size(B) | Link |
---|---|---|---|---|
Publicly Accessbile |
T5 | 2019/10 | 11 | Paper |
mT5 | 2021/03 | 13 | Paper | |
PanGu-α | 2021/05 | 13 | Paper | |
CPM-2 | 2021/05 | 198 | Paper | |
T0 | 2021/10 | 11 | Paper | |
GPT-NeoX-20B | 2022/02 | 20 | Paper | |
CodeGen | 2022/03 | 16 | Paper | |
Tk-Instruct | 2022/04 | 11 | Paper | |
UL2 | 2022/02 | 20 | Paper | |
OPT | 2022/05 | 175 | Paper | |
YaLM | 2022/06 | 100 | GitHub | |
NLLB | 2022/07 | 55 | Paper | |
BLOOM | 2022/07 | 176 | Paper | |
GLM | 2022/08 | 130 | Paper | |
Flan-T5 | 2022/10 | 11 | Paper | |
mT0 | 2022/11 | 13 | Paper | |
Galatica | 2022/11 | 120 | Paper | |
BLOOMZ | 2022/11 | 176 | Paper | |
OPT-IML | 2022/12 | 175 | Paper | |
Pythia | 2023/01 | 12 | Paper | |
LLaMA | 2023/02 | 7/13/65 | Paper | |
Vicuna | 2023/03 | 13 | Blog | |
ChatGLM | 2023/03 | 6 | GitHub | |
CodeGeeX | 2023/03 | 13 | Paper | |
Koala | 2023/04 | 13 | Blog | |
Falcon | 2023/06 | 7/40 | Blog | |
Llama-2 | 2023/07 | 7/13/70 | Paper | |
Closed Source |
GShard | 2020/01 | 600 | Paper |
GPT-3 | 2020/05 | 175 | Paper | |
LaMDA | 2021/05 | 137 | Paper | |
HyperCLOVA | 2021/06 | 82 | Paper | |
Codex | 2021/07 | 12 | Paper | |
ERNIE 3.0 | 2021/07 | 10 | Paper | |
Jurassic-1 | 2021/08 | 178 | Paper | |
FLAN | 2021/10 | 137 | Paper | |
MT-NLG | 2021/10 | 530 | Paper | |
Yuan 1.0 | 2021/10 | 245 | Paper | |
Anthropic | 2021/12 | 52 | Paper | |
WebGPT | 2021/12 | 175 | Paper | |
Gopher | 2021/12 | 280 | Paper | |
ERNIE 3.0 Titan | 2021/12 | 260 | Paper | |
GLaM | 2021/12 | 1200 | Paper | |
InstructGPT | 2022/01 | 175 | Paper | |
AlphaCode | 2022/02 | 41 | Paper | |
Chinchilla | 2022/03 | 70 | Paper | |
PaLM | 2022/04 | 540 | Paper | |
Cohere | 2022/06 | 54 | Homepage | |
AlexaTM | 2022/08 | 20 | Paper | |
Luminous | 2022/09 | 70 | Docs | |
Sparrow | 2022/09 | 70 | Paper | |
WeLM | 2022/09 | 10 | Paper | |
U-PaLM | 2022/10 | 540 | Paper | |
Flan-PaLM | 2022/10 | 540 | Paper | |
Flan-U-PaLM | 2022/10 | 540 | Paper | |
Alpaca | 2023/03 | 7 | Blog | |
GPT-4 | 2023/3 | - | Paper | |
PanGU-Σ | 2023/3 | 1085 | Paper |
- BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
- Guntenburg: [Source]
- CommonCrawl: [Source]
- C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
- CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
- CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
- REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
- OpenWebText: [Source]
- Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
- Wikipedia: [Source]
- BigQuery: [Source]
- The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
- ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]