The Practical Guides for Large Language Models

Awesome

A curated (still actively updated) list of practical guide resources of LLMs. It's based on our survey paper: Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.

These sources aim to help practitioners navigate the vast landscape of large language models (LLMs) and their applications in natural language processing (NLP) applications. If you find any resources in our repository helpful, please feel free to use them (and don't forget to cite our paper!)

Latest News💥

  • We used PowerPoint to plot the figure and released the source file pptx for our GIF figure. We welcome pull requests to refine this figure, and if you find the source helpful, please cite our paper.

    @article{yang2023harnessing,
        title={Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond}, 
        author={Jingfeng Yang and Hongye Jin and Ruixiang Tang and Xiaotian Han and Qizhang Feng and Haoming Jiang and Bing Yin and Xia Hu},
        year={2023},
        eprint={2304.13712},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
    }

Practical Guide for Models

We build an evolutionary tree of modern Large Language Models (LLMs) to trace the development of language models in recent years and highlights some of the most well-known models, in the following figure:

BERT-style Language Models: Encoder-Decoder or Encoder-only

  • BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018, Paper
  • RoBERTa ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 2019, Paper
  • DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019, Paper
  • ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 2019, Paper
  • ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS, 2020, Paper
  • T5 "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper]
  • GLM "GLM-130B: An Open Bilingual Pre-trained Model". 2022. [Paper]
  • AlexaTM "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. [Paper]

GPT-style Language Models: Decoder-only

  • GPT-3 "Language Models are Few-Shot Learners". NeurIPS 2020. [Paper]
  • OPT "OPT: Open Pre-trained Transformer Language Models". 2022. [Paper]
  • PaLM "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [Paper]
  • BLOOM "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". 2022. [Paper]
  • MT-NLG "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". 2021. [Paper]
  • GLaM "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". ICML 2022. [Paper]
  • Gopher "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". 2021. [Paper]
  • chinchilla "Training Compute-Optimal Large Language Models". 2022. [Paper]
  • LaMDA "LaMDA: Language Models for Dialog Applications". 2021. [Paper]
  • LLaMA "LLaMA: Open and Efficient Foundation Language Models". 2023. [Paper]
  • GPT-4 "GPT-4 Technical Report". 2023. [Paper]
  • BloombergGPT BloombergGPT: A Large Language Model for Finance, 2023, Paper
  • GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". 2022. [Paper]

Practical Guide for Data

Pretraining data

  • How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
  • Scaling laws for neural language models, 2020. Paper
  • Data-centric artificial intelligence: A survey, 2023. Paper

Finetuning data

  • Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach, EMNLP 2019. Paper
  • Language Models are Few-Shot Learners, NIPS 2020. Paper
  • Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023 Paper

Test data/user data

  • Shortcut learning of large language models in natural language understanding: A survey, Arxiv 2023. Paper
  • On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective Arxiv, 2023. Paper

Practical Guide for NLP Tasks

We build a decision flow for choosing LLMs or fine-tuned models~\protect\footnotemark for user's NLP applications. The decision flow helps users assess whether their downstream NLP applications at hand meet specific conditions and, based on that evaluation, determine whether LLMs or fine-tuned models are the most suitable choice for their applications.

Traditional NLU tasks

  • A benchmark for toxic comment classification on civil comments dataset Arxiv 2023 Paper
  • Is chatgpt a general-purpose natural language processing task solver? Arxiv 2023Paper
  • Benchmarking large language models for news summarization Arxiv 2022 Paper

Generation tasks

  • News summarization and evaluation in the era of gpt-3 Arxiv 2022 Paper
  • Is chatgpt a good translator? yes with gpt-4 as the engine Arxiv 2023 Paper
  • Multilingual machine translation systems from Microsoft for WMT21 shared task, WMT2021 Paper
  • Can ChatGPT understand too? a comparative study on chatgpt and fine-tuned bert, Arxiv 2023, Paper

Knowledge-intensive tasks

  • Measuring massive multitask language understanding, ICLR 2021 Paper
  • Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, Arxiv 2022 Paper
  • Inverse scaling prize, 2022 Link

Abilities with Scaling

  • Training Compute-Optimal Large Language Models, NeurIPS 2022 Paper
  • Scaling Laws for Neural Language Models, Arxiv 2020 Paper
  • Solving math word problems with process- and outcome-based feedback, Arxiv 2022 Paper
  • Chain of thought prompting elicits reasoning in large language models, NeurIPS 2022 Paper
  • Emergent abilities of large language models, TMLR 2022 Paper
  • Inverse scaling can become U-shaped, Arxiv 2022 Paper

Specific tasks

  • Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Arixv 2022 Paper
  • PaLI: A Jointly-Scaled Multilingual Language-Image Model, Arxiv 2022 Paper
  • AugGPT: Leveraging ChatGPT for Text Data Augmentation, Arxiv 2023 Paper
  • Is gpt-3 a good data annotator?, Arxiv 2022 Paper
  • Want To Reduce Labeling Cost? GPT-3 Can Help, EMNLP findings 2021 Paper
  • GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation, EMNLP findings 2021 Paper
  • LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability, Arxiv 2023 Paper
  • ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks, Arxiv 2023 Paper
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Arxiv 2023 Paper
  • GPTScore: Evaluate as You Desire, Arxiv 2023 Paper
  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, Arxiv 2023 Paper
  • Is ChatGPT a Good NLG Evaluator? A Preliminary Study, Arxiv 2023 Paper

Real-World ''Tasks''

Efficiency

  1. Cost
  • Openai’s gpt-3 language model: A technical overview, 2020. Blog Post
  • Measuring the carbon intensity of ai in cloud instances, FaccT 2022. Paper
  • In AI, is bigger always better?, Nature Article 2023. Article
  • Language Models are Few-Shot Learners, NeurIPS 2020. Paper
  • Pricing, OpenAI. Blog Post
  1. Latency
  • Holistic evaluation of language models, Arxiv 2022. Paper
  1. Parameter-Efficient Fine-Tuning
  • LoRA: Low-Rank Adaptation of Large Language Models, Arxiv 2021. Paper
  • Prefix-Tuning: Optimizing Continuous Prompts for Generation, ACL 2021. Paper
  • P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks, ACL 2022. Paper
  • P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Arxiv 2022. Paper

Trustworthiness

  1. Robustness and Calibration
  • Calibrate before use: Improving few-shot performance of language models, ICML 2021. Paper
  • SPeC: A Soft Prompt-Based Calibration on Mitigating Performance Variability in Clinical Notes Summarization, Arxiv 2023. Paper
  1. Spurious biases
  • Shortcut learning of large language models in natural language understanding: A survey, 2023 Paper
  • Mitigating gender bias in captioning system, WWW 2020 Paper
  • Calibrate Before Use: Improving Few-Shot Performance of Language Models, ICML 2021 Paper
  • Shortcut Learning in Deep Neural Networks, Nature Machine Intelligence 2020 Paper
  • Do Prompt-Based Models Really Understand the Meaning of Their Prompts?, NAACL 2022 Paper
  1. Safety issues
  • GPT-4 System Card, 2023 Paper
  • The science of detecting llm-generated texts, Arxiv 2023 Paper
  • Constitutional ai: Harmlessness from ai feedback, Arxiv 2022 Paper
  • How stereotypes are shared through language: a review and introduction of the aocial categories and stereotypes communication (scsc) framework, Review of Communication Research, 2019 Paper
  • Gender shades: Intersectional accuracy disparities in commercial gender classification, FaccT 2018 Paper