Large Language Models

Table of Content

Milestone Papers (credit from Github)

Date	keywords	Institute	Paper	Publication
2017-06	Transformers	Google	Attention Is All You Need	NeurIPS 2017
2018-06	GPT 1.0	OpenAI	Improving Language Understanding by Generative Pre-Training	OpenAI
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	NAACL 2019
2019-02	GPT 2.0	OpenAI	Language Models are Unsupervised Multitask Learners	-
2019-10	T5	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	JMLR 2020
2020-01	Scaling Law	OpenAI	Scaling Laws for Neural Language Models	-
2020-05	GPT 3.0	OpenAI	Language models are few-shot learners	NeurIPS 2020
2021-01	Switch Transformers	Google	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	JMLR 2022
2021-09	FLAN	Google	Finetuned Language Models are Zero-Shot Learners	ICLR 2022
2021-12	Retro	DeepMind	Improving language models by retrieving from trillions of tokens	ICML 2022
2022-01	LaMDA	Google	LaMDA: Language Models for Dialog Applications	-
2022-01	Megatron-Turing NLG	Microsoft&NVIDIA	Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model	-
2022-03	InstructGPT	OpenAI	Training language models to follow instructions with human feedback	-
2022-05	OPT	Meta	OPT: Open Pre-trained Transformer Language Models	-
2022-06	Emergent Abilities	Google	Emergent Abilities of Large Language Models	TMLR 2022
2022-06	METALM	Microsoft	Language Models are General-Purpose Interfaces	-
2022-10	GLM-130B	Tsinghua	GLM-130B: An Open Bilingual Pre-trained Model	-
2022-11	HELM	Stanford	Holistic Evaluation of Language Models	-
2023-02	LLaMA	Meta	LLaMA: Open and Efficient Foundation Language Models	-
2023-03	GPT 4	OpenAI	GPT-4 Technical Report	-

Timeline of LLMs

The wonderful visualization below from this [survey paper](https://arxiv.org/pdf/2304.13712.pdf) that summarizes the evolutionary tree of modern LLMs traces the development of language models in recent years and highlights some of the most well-known models:

Other Awesome Lists

Pretraining data

RedPajama, 2023. Repo
The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Arxiv 2020. Paper
How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
Scaling laws for neural language models, 2020. Paper
Data-centric artificial intelligence: A survey, 2023. Paper
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022. Blog
Awesome ChatGPT Prompts: a collection of prompt examples to be used with the ChatGPT model.
Prompt-Learning

(2020-12) Making Pre-trained Language Models Better Few-shot Learners paper
(2021-07) Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing paper
Instruction-Tuning-Papers: a trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
Chain-of-Thought: a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning.

(2021-01) Chain of Thought Prompting Elicits Reasoning in Large Language Models. paper

List of LLMs

Category	model	Release Time	Size(B)	Link
Category	model	Release Time	Size(B)	Link
Publicly Accessbile	T5	2019/10	11	Paper
	mT5	2021/03	13	Paper
	PanGu-α	2021/05	13	Paper
	CPM-2	2021/05	198	Paper
	T0	2021/10	11	Paper
	GPT-NeoX-20B	2022/02	20	Paper
	CodeGen	2022/03	16	Paper
	Tk-Instruct	2022/04	11	Paper
	UL2	2022/02	20	Paper
	OPT	2022/05	175	Paper
	YaLM	2022/06	100	GitHub
	NLLB	2022/07	55	Paper
	BLOOM	2022/07	176	Paper
	GLM	2022/08	130	Paper
	Flan-T5	2022/10	11	Paper
	mT0	2022/11	13	Paper
	Galatica	2022/11	120	Paper
	BLOOMZ	2022/11	176	Paper
	OPT-IML	2022/12	175	Paper
	Pythia	2023/01	12	Paper
	LLaMA	2023/02	7/13/65	Paper
	Vicuna	2023/03	13	Blog
	ChatGLM	2023/03	6	GitHub
	CodeGeeX	2023/03	13	Paper
	Koala	2023/04	13	Blog
	Falcon	2023/06	7/40	Blog
	Llama-2	2023/07	7/13/70	Paper
Closed Source	GShard	2020/01	600	Paper
	GPT-3	2020/05	175	Paper
	LaMDA	2021/05	137	Paper
	HyperCLOVA	2021/06	82	Paper
	Codex	2021/07	12	Paper
	ERNIE 3.0	2021/07	10	Paper
	Jurassic-1	2021/08	178	Paper
	FLAN	2021/10	137	Paper
	MT-NLG	2021/10	530	Paper
	Yuan 1.0	2021/10	245	Paper
	Anthropic	2021/12	52	Paper
	WebGPT	2021/12	175	Paper
	Gopher	2021/12	280	Paper
	ERNIE 3.0 Titan	2021/12	260	Paper
	GLaM	2021/12	1200	Paper
	InstructGPT	2022/01	175	Paper
	AlphaCode	2022/02	41	Paper
	Chinchilla	2022/03	70	Paper
	PaLM	2022/04	540	Paper
	Cohere	2022/06	54	Homepage
	AlexaTM	2022/08	20	Paper
	Luminous	2022/09	70	Docs
	Sparrow	2022/09	70	Paper
	WeLM	2022/09	10	Paper
	U-PaLM	2022/10	540	Paper
	Flan-PaLM	2022/10	540	Paper
	Flan-U-PaLM	2022/10	540	Paper
	Alpaca	2023/03	7	Blog
	GPT-4	2023/3	-	Paper
	PanGU-Σ	2023/3	1085	Paper

Commonly Used Corpora

BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
Guntenburg: [Source]
CommonCrawl: [Source]
C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
OpenWebText: [Source]
Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
Wikipedia: [Source]
BigQuery: [Source]
The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

IggyZhao/llm