The Practical Guides for Large Language Models

A curated (still actively updated) list of practical guide resources of LLMs. It's based on our survey paper: Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond and efforts from @xinyadu. The survey is partially based on the second half of this Blog. We also build an evolutionary tree of modern Large Language Models (LLMs) to trace the development of language models in recent years and highlights some of the most well-known models.

These sources aim to help practitioners navigate the vast landscape of large language models (LLMs) and their applications in natural language processing (NLP) applications. We also include their usage restrictions based on the model and data licensing information. If you find any resources in our repository helpful, please feel free to use them (don't forget to cite our paper! 😃). We welcome pull requests to refine this figure!

    @article{yang2023harnessing,
        title={Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond}, 
        author={Jingfeng Yang and Hongye Jin and Ruixiang Tang and Xiaotian Han and Qizhang Feng and Haoming Jiang and Bing Yin and Xia Hu},
        year={2023},
        eprint={2304.13712},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
    }

Latest News💥

We added usage and restrictions section.
We used PowerPoint to plot the figure and released the source file pptx for our GIF figure. [4/27/2023]
We released the source file for the still version pptx, and replaced the figure in this repo with the still version. [4/29/2023]
Add AlexaTM, UniLM, UniLMv2 to the figure, and correct the logo for Tk. [4/29/2023]
Add usage and Restrictions (for commercial and research purposes) section. Credits to Dr. Du. [5/8/2023]

Other Practical Guides for LLMs

Why did all of the public reproduction of GPT-3 fail? In which tasks should we use GPT-3.5/ChatGPT? 2023, Blog
Building LLM applications for production, 2023, Blog
Data-centric Artificial Intelligence, 2023, Repo/Blog/Paper

Catalog

The Practical Guides for Large Language Models

Practical Guide for Models

BERT-style Language Models: Encoder-Decoder or Encoder-only

BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018, Paper
RoBERTa RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019, Paper
DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019, Paper
ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 2019, Paper
UniLM Unified Language Model Pre-training for Natural Language Understanding and Generation, 2019 Paper
ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS, 2020, Paper
T5 "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. Paper
GLM "GLM-130B: An Open Bilingual Pre-trained Model". 2022. Paper
AlexaTM "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. Paper
ST-MoE ST-MoE: Designing Stable and Transferable Sparse Expert Models. 2022 Paper

GPT-style Language Models: Decoder-only

GPT Improving Language Understanding by Generative Pre-Training. 2018. Paper
GPT-2 Language Models are Unsupervised Multitask Learners. 2018. Paper
GPT-3 "Language Models are Few-Shot Learners". NeurIPS 2020. Paper
OPT "OPT: Open Pre-trained Transformer Language Models". 2022. Paper
PaLM "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. Paper
BLOOM "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". 2022. Paper
MT-NLG "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". 2021. Paper
GLaM "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". ICML 2022. Paper
Gopher "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". 2021. Paper
chinchilla "Training Compute-Optimal Large Language Models". 2022. Paper
LaMDA "LaMDA: Language Models for Dialog Applications". 2021. Paper
LLaMA "LLaMA: Open and Efficient Foundation Language Models". 2023. Paper
GPT-4 "GPT-4 Technical Report". 2023. Paper
BloombergGPT BloombergGPT: A Large Language Model for Finance, 2023, Paper
GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". 2022. Paper
PaLM 2: "PaLM 2 Technical Report". 2023. Tech.Report
LLaMA 2: "Llama 2: Open foundation and fine-tuned chat models". 2023. Paper
Claude 2: "Model Card and Evaluations for Claude Models". 2023. Model Card

Practical Guide for Data

Pretraining data

RedPajama, 2023. Repo
The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Arxiv 2020. Paper
How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
Scaling laws for neural language models, 2020. Paper
Data-centric artificial intelligence: A survey, 2023. Paper
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022. Blog

Finetuning data

Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach, EMNLP 2019. Paper
Language Models are Few-Shot Learners, NIPS 2020. Paper
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023 Paper

Test data/user data

Shortcut learning of large language models in natural language understanding: A survey, Arxiv 2023. Paper
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective Arxiv, 2023. Paper
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Arxiv 2019. Paper

Practical Guide for NLP Tasks

We build a decision flow for choosing LLMs or fine-tuned models~\protect\footnotemark for user's NLP applications. The decision flow helps users assess whether their downstream NLP applications at hand meet specific conditions and, based on that evaluation, determine whether LLMs or fine-tuned models are the most suitable choice for their applications.

Traditional NLU tasks

A benchmark for toxic comment classification on civil comments dataset Arxiv 2023 Paper
Is chatgpt a general-purpose natural language processing task solver? Arxiv 2023Paper
Benchmarking large language models for news summarization Arxiv 2022 Paper

Generation tasks

News summarization and evaluation in the era of gpt-3 Arxiv 2022 Paper
Is chatgpt a good translator? yes with gpt-4 as the engine Arxiv 2023 Paper
Multilingual machine translation systems from Microsoft for WMT21 shared task, WMT2021 Paper
Can ChatGPT understand too? a comparative study on chatgpt and fine-tuned bert, Arxiv 2023, Paper

Knowledge-intensive tasks

Measuring massive multitask language understanding, ICLR 2021 Paper
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, Arxiv 2022 Paper
Inverse scaling prize, 2022 Link
Atlas: Few-shot Learning with Retrieval Augmented Language Models, Arxiv 2022 Paper
Large Language Models Encode Clinical Knowledge, Arxiv 2022 Paper

Abilities with Scaling

Training Compute-Optimal Large Language Models, NeurIPS 2022 Paper
Scaling Laws for Neural Language Models, Arxiv 2020 Paper
Solving math word problems with process- and outcome-based feedback, Arxiv 2022 Paper
Chain of thought prompting elicits reasoning in large language models, NeurIPS 2022 Paper
Emergent abilities of large language models, TMLR 2022 Paper
Inverse scaling can become U-shaped, Arxiv 2022 Paper
Towards Reasoning in Large Language Models: A Survey, Arxiv 2022 Paper

Specific tasks

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Arixv 2022 Paper
PaLI: A Jointly-Scaled Multilingual Language-Image Model, Arxiv 2022 Paper
AugGPT: Leveraging ChatGPT for Text Data Augmentation, Arxiv 2023 Paper
Is gpt-3 a good data annotator?, Arxiv 2022 Paper
Want To Reduce Labeling Cost? GPT-3 Can Help, EMNLP findings 2021 Paper
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation, EMNLP findings 2021 Paper
LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability, Arxiv 2023 Paper
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks, Arxiv 2023 Paper
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Arxiv 2023 Paper
GPTScore: Evaluate as You Desire, Arxiv 2023 Paper
Large Language Models Are State-of-the-Art Evaluators of Translation Quality, Arxiv 2023 Paper
Is ChatGPT a Good NLG Evaluator? A Preliminary Study, Arxiv 2023 Paper

Real-World ''Tasks''

Sparks of Artificial General Intelligence: Early experiments with GPT-4, Arxiv 2023 Paper

Efficiency

Cost

Openai’s gpt-3 language model: A technical overview, 2020. Blog Post
Measuring the carbon intensity of ai in cloud instances, FaccT 2022. Paper
In AI, is bigger always better?, Nature Article 2023. Article
Language Models are Few-Shot Learners, NeurIPS 2020. Paper
Pricing, OpenAI. Blog Post

Latency

HELM: Holistic evaluation of language models, Arxiv 2022. Paper

Parameter-Efficient Fine-Tuning

LoRA: Low-Rank Adaptation of Large Language Models, Arxiv 2021. Paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation, ACL 2021. Paper
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks, ACL 2022. Paper
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Arxiv 2022. Paper

Pretraining System

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Arxiv 2019. Paper
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Arxiv 2019. Paper
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM, Arxiv 2021. Paper
Reducing Activation Recomputation in Large Transformer Models, Arxiv 2021. Paper

Trustworthiness

Robustness and Calibration

Calibrate before use: Improving few-shot performance of language models, ICML 2021. Paper
SPeC: A Soft Prompt-Based Calibration on Mitigating Performance Variability in Clinical Notes Summarization, Arxiv 2023. Paper

Spurious biases

Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning, Findings of ACL 2023 Paper
Shortcut learning of large language models in natural language understanding: A survey, 2023 Paper
Mitigating gender bias in captioning system, WWW 2020 Paper
Calibrate Before Use: Improving Few-Shot Performance of Language Models, ICML 2021 Paper
Shortcut Learning in Deep Neural Networks, Nature Machine Intelligence 2020 Paper
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?, NAACL 2022 Paper

Safety issues

GPT-4 System Card, 2023 Paper
The science of detecting llm-generated texts, Arxiv 2023 Paper
How stereotypes are shared through language: a review and introduction of the aocial categories and stereotypes communication (scsc) framework, Review of Communication Research, 2019 Paper
Gender shades: Intersectional accuracy disparities in commercial gender classification, FaccT 2018 Paper

Benchmark Instruction Tuning

FLAN: Finetuned Language Models Are Zero-Shot Learners, Arxiv 2021 Paper
T0: Multitask Prompted Training Enables Zero-Shot Task Generalization, Arxiv 2021 Paper
Cross-task generalization via natural language crowdsourcing instructions, ACL 2022 Paper
Tk-INSTRUCT: Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, EMNLP 2022 Paper
FLAN-T5/PaLM: Scaling Instruction-Finetuned Language Models, Arxiv 2022 Paper
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning, Arxiv 2023 Paper
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, Arxiv 2023 Paper

Alignment

Deep Reinforcement Learning from Human Preferences, NIPS 2017 Paper
Learning to summarize from human feedback, Arxiv 2020 Paper
A General Language Assistant as a Laboratory for Alignment, Arxiv 2021 Paper
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Arxiv 2022 Paper
Teaching language models to support answers with verified quotes, Arxiv 2022 Paper
InstructGPT: Training language models to follow instructions with human feedback, Arxiv 2022 Paper
Improving alignment of dialogue agents via targeted human judgements, Arxiv 2022 Paper
Scaling Laws for Reward Model Overoptimization, Arxiv 2022 Paper
Scalable Oversight: Measuring Progress on Scalable Oversight for Large Language Models, Arxiv 2022 Paper

Safety Alignment (Harmless)

Red Teaming Language Models with Language Models, Arxiv 2022 Paper
Constitutional ai: Harmlessness from ai feedback, Arxiv 2022 Paper
The Capacity for Moral Self-Correction in Large Language Models, Arxiv 2023 Paper
OpenAI: Our approach to AI safety, 2023 Blog

Truthfulness Alignment (Honest)

Reinforcement Learning for Language Models, 2023 Blog

Practical Guides for Prompting (Helpful)

OpenAI Cookbook. Blog
Prompt Engineering. Blog
ChatGPT Prompt Engineering for Developers! Course

Alignment Efforts of Open-source Communtity

Self-Instruct: Aligning Language Model with Self Generated Instructions, Arxiv 2022 Paper
Alpaca. Repo
Vicuna. Repo
Dolly. Blog
DeepSpeed-Chat. Blog
GPT4All. Repo
OpenAssitant. Repo
ChatGLM. Repo
MOSS. Repo
Lamini. Repo/Blog

Usage and Restrictions

We build a table summarizing the LLMs usage restrictions (e.g. for commercial and research purposes). In particular, we provide the information from the models and their pretraining data's perspective. We urge the users in the community to refer to the licensing information for public models and data and use them in a responsible manner. We urge the developers to pay special attention to licensing, make them transparent and comprehensive, to prevent any unwanted and unforeseen usage.

LLMs	Model			Data
	License	Commercial Use	Other noteable restrictions	License	Corpus
Encoder-only
BERT series of models (general domain)	Apache 2.0	✅		Public	BooksCorpus, English Wikipedia
RoBERTa	MIT license	✅		Public	BookCorpus, CC-News, OpenWebText, STORIES
ERNIE	Apache 2.0	✅		Public	English Wikipedia
SciBERT	Apache 2.0	✅		Public	BERT corpus, 1.14M papers from Semantic Scholar
LegalBERT	CC BY-SA 4.0	❌		Public (except data from the Case Law Access Project)	EU legislation, US court cases, etc.
BioBERT	Apache 2.0	✅		PubMed	PubMed, PMC
Encoder-Decoder
T5	Apache 2.0	✅		Public	C4
Flan-T5	Apache 2.0	✅		Public	C4, Mixture of tasks (Fig 2 in paper)
BART	Apache 2.0	✅		Public	RoBERTa corpus
GLM	Apache 2.0	✅		Public	BooksCorpus and English Wikipedia
ChatGLM	ChatGLM License	❌	No use for illegal purposes or military research, no harm the public interest of society	N/A	1T tokens of Chinese and English corpus
Decoder-only
GPT2	Modified MIT License	✅	Use GPT-2 responsibly and clearly indicate your content was created using GPT-2.	Public	WebText
GPT-Neo	MIT license	✅		Public	Pile
GPT-J	Apache 2.0	✅		Public	Pile
---> Dolly	CC BY NC 4.0	❌		CC BY NC 4.0, Subject to terms of Use of the data generated by OpenAI	Pile, Self-Instruct
---> GPT4ALL-J	Apache 2.0	✅		Public	GPT4All-J dataset
Pythia	Apache 2.0	✅		Public	Pile
---> Dolly v2	MIT license	✅		Public	Pile, databricks-dolly-15k
OPT	OPT-175B LICENSE AGREEMENT	❌	No development relating to surveillance research and military, no harm the public interest of society	Public	RoBERTa corpus, the Pile, PushShift.io Reddit
---> OPT-IML	OPT-175B LICENSE AGREEMENT	❌	same to OPT	Public	OPT corpus, Extended version of Super-NaturalInstructions
YaLM	Apache 2.0	✅		Unspecified	Pile, Teams collected Texts in Russian
BLOOM	The BigScience RAIL License	✅	No use of generating verifiably false information with the purpose of harming others; content without expressly disclaiming that the text is machine generated	Public	ROOTS corpus (Lauren¸con et al., 2022)
---> BLOOMZ	The BigScience RAIL License	✅	same to BLOOM	Public	ROOTS corpus, xP3
Galactica	CC BY-NC 4.0	❌		N/A	The Galactica Corpus
LLaMA	Non-commercial bespoke license	❌	No development relating to surveillance research and military, no harm the public interest of society	Public	CommonCrawl, C4, Github, Wikipedia, etc.
---> Alpaca	CC BY NC 4.0	❌		CC BY NC 4.0, Subject to terms of Use of the data generated by OpenAI	LLaMA corpus, Self-Instruct
---> Vicuna	CC BY NC 4.0	❌		Subject to terms of Use of the data generated by OpenAI; Privacy Practices of ShareGPT	LLaMA corpus, 70K conversations from ShareGPT.com
---> GPT4ALL	GPL Licensed LLaMa	❌		Public	GPT4All dataset
OpenLLaMA	Apache 2.0	✅		Public	RedPajama
CodeGeeX	The CodeGeeX License	❌	No use for illegal purposes or military research	Public	Pile, CodeParrot, etc.
StarCoder	BigCode OpenRAIL-M v1 license	✅	No use of generating verifiably false information with the purpose of harming others; content without expressly disclaiming that the text is machine generated	Public	The Stack
MPT-7B	Apache 2.0	✅		Public	mC4 (english), The Stack, RedPajama, S2ORC
falcon	TII Falcon LLM License	✅/❌	Available under a license allowing commercial use	Public	RefinedWeb

binu-alexander/LLMsPracticalGuide