/llm

Primary LanguageJupyter Notebook

Large Language Models

Table of Content

Milestone Papers (credit from Github)

Date keywords Institute Paper Publication
2017-06 Transformers Google Attention Is All You Need NeurIPS 2017
2018-06 GPT 1.0 OpenAI Improving Language Understanding by Generative Pre-Training OpenAI
2018-10 BERT Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding NAACL 2019
2019-02 GPT 2.0 OpenAI Language Models are Unsupervised Multitask Learners -
2019-10 T5 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR 2020
2020-01 Scaling Law OpenAI Scaling Laws for Neural Language Models -
2020-05 GPT 3.0 OpenAI Language models are few-shot learners NeurIPS 2020
2021-01 Switch Transformers Google Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity JMLR 2022
2021-09 FLAN Google Finetuned Language Models are Zero-Shot Learners ICLR 2022
2021-12 Retro DeepMind Improving language models by retrieving from trillions of tokens ICML 2022
2022-01 LaMDA Google LaMDA: Language Models for Dialog Applications -
2022-01 Megatron-Turing NLG Microsoft&NVIDIA Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model -
2022-03 InstructGPT OpenAI Training language models to follow instructions with human feedback -
2022-05 OPT Meta OPT: Open Pre-trained Transformer Language Models -
2022-06 Emergent Abilities Google Emergent Abilities of Large Language Models TMLR 2022
2022-06 METALM Microsoft Language Models are General-Purpose Interfaces -
2022-10 GLM-130B Tsinghua GLM-130B: An Open Bilingual Pre-trained Model -
2022-11 HELM Stanford Holistic Evaluation of Language Models -
2023-02 LLaMA Meta LLaMA: Open and Efficient Foundation Language Models -
2023-03 GPT 4 OpenAI GPT-4 Technical Report -

Timeline of LLMs

LLMs_timeline


The wonderful visualization below from this [survey paper](https://arxiv.org/pdf/2304.13712.pdf) that summarizes the evolutionary tree of modern LLMs traces the development of language models in recent years and highlights some of the most well-known models:

Other Awesome Lists

  • Pretraining data

    RedPajama, 2023. Repo
    The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Arxiv 2020. Paper
    How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
    Scaling laws for neural language models, 2020. Paper
    Data-centric artificial intelligence: A survey, 2023. Paper
    How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022. Blog

  • Awesome ChatGPT Prompts: a collection of prompt examples to be used with the ChatGPT model.

  • Prompt-Learning

    (2020-12) Making Pre-trained Language Models Better Few-shot Learners paper
    (2021-07) Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing paper

  • Instruction-Tuning-Papers: a trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).

  • Chain-of-Thought: a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning.

    (2021-01) Chain of Thought Prompting Elicits Reasoning in Large Language Models. paper

List of LLMs

Category model Release Time Size(B) Link
Publicly
Accessbile
T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
YaLM 2022/06 100 GitHub
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pythia 2023/01 12 Paper
LLaMA 2023/02 7/13/65 Paper
Vicuna 2023/03 13 Blog
ChatGLM 2023/03 6 GitHub
CodeGeeX 2023/03 13 Paper
Koala 2023/04 13 Blog
Falcon 2023/06 7/40 Blog
Llama-2 2023/07 7/13/70 Paper
Closed
Source
GShard 2020/01 600 Paper
GPT-3 2020/05 175 Paper
LaMDA 2021/05 137 Paper
HyperCLOVA 2021/06 82 Paper
Codex 2021/07 12 Paper
ERNIE 3.0 2021/07 10 Paper
Jurassic-1 2021/08 178 Paper
FLAN 2021/10 137 Paper
MT-NLG 2021/10 530 Paper
Yuan 1.0 2021/10 245 Paper
Anthropic 2021/12 52 Paper
WebGPT 2021/12 175 Paper
Gopher 2021/12 280 Paper
ERNIE 3.0 Titan 2021/12 260 Paper
GLaM 2021/12 1200 Paper
InstructGPT 2022/01 175 Paper
AlphaCode 2022/02 41 Paper
Chinchilla 2022/03 70 Paper
PaLM 2022/04 540 Paper
Cohere 2022/06 54 Homepage
AlexaTM 2022/08 20 Paper
Luminous 2022/09 70 Docs
Sparrow 2022/09 70 Paper
WeLM 2022/09 10 Paper
U-PaLM 2022/10 540 Paper
Flan-PaLM 2022/10 540 Paper
Flan-U-PaLM 2022/10 540 Paper
Alpaca 2023/03 7 Blog
GPT-4 2023/3 - Paper
PanGU-Σ 2023/3 1085 Paper

Commonly Used Corpora

  1. BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
  2. Guntenburg: [Source]
  3. CommonCrawl: [Source]
  4. C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
  5. CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
  6. CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
  7. REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
  8. OpenWebText: [Source]
  9. Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
  10. Wikipedia: [Source]
  11. BigQuery: [Source]
  12. The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
  13. ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]