/llm-primer

A primer on large language models (LLM) as of Jan 2023, with bonus ChatGPT topic

Apache License 2.0Apache-2.0

llm-primer

A primer on large language models (LLM) and ChatGPT as of April 2023

Update 05/2023

V2 Covers

Intro: Building blocks & capabilities
  • LM and LLM
  • Transformer
  • How are LLMs trained?
  • LLM decoding
  • LLM training in parallel
  • LLM capabilities, advanced capabilities and insance capabilities
Core: Models, players, concepts, toolings & applications
  • Selected LLMs
    • BERT
    • GPT family
    • T5
    • GLM
  • LLM Players
    • big companies
    • institutes and startups)
  • LLM concepts
    • Pretraining, finetuning, prompt tuning
    • Scaling laws
    • Prompt engineering
    • Prompt tuning (soft prompt)
    • "Emergent abilities"
    • Chain of thoughts (CoT)
    • Least-to-most prompting
    • Hallucination
    • Retrieval LLM
    • RLHF for LLM
    • Mixture of Experts (MoE) LLM
  • LLM Tooling
    • Huggingface
    • TF hub, Torch NLP, PaddleNLP
    • Transformers lib, Colossal-ai, Ray and NanoGPT
    • Other toolings
  • LLM Applications
Bonus: Deep dive into ChatGPT
  • ChatGPT model evolvment
  • Research (InstructGPT) overview
  • Possible next steps for ChatGPT?
  • Engineering discussion
  • Rough estimate to train/server chatgpt
  • My thoughts on technical challenges to reproduce ChatGPT
  • What less optimal choices Google made related to ChatGPT delayed Google to release similar product?
  • Fun facts
  • ChatGPT challenges
  • Final question: Will ChatGPT become next iPhone, or next Alexa, or next ClubHouse?

V2.1 Covers

  • LLM basics
  • RL basics
  • ChatGPT
  • Societal Impact

Slides

Additional notes

Chinese only notes (my personal opinion)

Recommended quick readings

~120 References as of 01/30/2023

Click to expand

TODO: V2.1 Reference list