Small introduction, paper, code etc.
Year | Name | Paper | Info | Implementation |
---|---|---|---|---|
2017 | Transformer | Attention is All you Need | The focus of the original research was on translation tasks. | TensorFlow + article |
2018 | GPT | Improving Language Understanding by Generative Pre-Training | The first pretrained Transformer model, used for fine-tuning on various NLP tasks and obtained state-of-the-art results | |
2018 | BERT | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Another large pretrained model, this one designed to produce better summaries of sentences | PyTorch |
2019 | GPT-2 | Language Models are Unsupervised Multitask Learners | An improved (and bigger) version of GPT that was not immediately publicly released due to ethical concerns | |
2019 | DistilBERT - Distilled BERT | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | A distilled version of BERT that is 60% faster, 40% lighter in memory, and still retains 97% of BERT’s performance | |
2019 | BART | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | Large pretrained models using the same architecture as the original Transformer model. | |
2019 | T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Large pretrained models using the same architecture as the original Transformer model. | |
2019 | ALBERT | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | ||
2019 | RoBERTa - A Robustly Optimized BERT Pretraining Approach | RoBERTa: A Robustly Optimized BERT Pretraining Approach | ||
2019 | CTRL | CTRL: A Conditional Transformer Language Model for Controllable Generation | ||
2019 | Transformer XL | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | Adopts a recurrence methodology over past state coupled with relative positional encoding enabling longer term dependencies | |
2020 | GPT-3 | Language Models are Few-Shot Learners | An even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called zero-shot learning) | |
2020 | ELECTRA | ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS | ||
2020 | mBART | Multilingual Denoising Pre-training for Neural Machine Translation | ||
2021 | Gopher | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ||
2022 | chatGPT/InstructGPT | Training language models to follow instructions with human feedback | This trained language model is much better at following user intentions than GPT-3. The model is optimised (fine tuned) using Reinforcement Learning with Human Feedback (RLHF) to achieve conversational dialogue. The model was trained using a variety of data which were written by people to achieve responses that sounded human-like. | :-: |
2022 | Chinchilla | Training Compute-Optimal Large Language Models | Uses the same compute budget as Gopher but with 70B parameters and 4x more more data. | :-: |
2022 | LaMDA - Language Models for Dialog Applications | LaMDA | It is a family of Transformer-based neural language models specialized for dialog | |
2023 | GPT-4 | GPT-4 Technical Report | The model now accepts multimodal inputs: images and text | :-: |
Name | Size (# Parameters) | Training Tokens | Training data |
---|---|---|---|
LaMDA | 137B | 168B | 1.56T words of public dialog data and web text |
Gopher | 280B | 300B | |
Chinchilla | 70B | 1.4T |
- M=Million | B=billion | T=Trillion
- ALBERT
- BART | BERT | Big Bird | BLOOM |
- Chinchilla | CLIP | CTRL | chatGPT
- DALL-E | DALL-E-2 | Decision Transformers | DialoGPT | DistilBERT | DQ-BART |
- ELECTRA | ERNIE |
- Flamingo |
- Gato | Gopher | GLaM | GLIDE | GC-ViT | GPT | GPT-2 | GPT-3 | GPT-4 | GPT-Neo | GPTInstruct |
- Imagen | InstructGPT
- Jurassic-1
- LaMDA
- mBART | Megatron | Minerva | MT-NLG
- OPT
- Palm | Pegasus
- RoBERTa
- SeeKer | Swin Transformer | Switch
- Transformer | T5 | Trajectory Transformers | Transformer XL | Turing-NLG
- ViT
- Wu Dao 2.0 |
- XLM-RoBERTa | XLNet
Architecture | Models | Tasks |
---|---|---|
Encoder-only, aka also called auto-encoding Transformer models | ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa | Sentence classification, named entity recognition, extractive question answering |
Decoder-only, aka auto-regressive (or causal) Transformer models | CTRL, GPT, GPT-2, Transformer XL | Text generation given a prompt |
Encoder-Decoder, aka sequence-to-sequence Transformer models | BART, T5, Marian, mBART | Summarisation, translation, generative question answering |
- HuggingFace, a popular NLP library, but it also offers an easy way to deploy models via their Inference API. When you build a model using the HuggingFace library, you can then train it and upload it to their Model Hub. Read more about this here.
- List of notebook
- 2014 | Neural Machine Translation by Jointly Learning to Align and Translate
- 2022 | A SURVEY ON GPT-3
- 2022 | Efficiently Scaling Transformer Inference
- https://github.com/thunlp/PLMpapers
- Building a synth with ChatGPT
- PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text
- ChatGPT - Where it lacks
- Awesome ChatGPT Prompts
- ChatGPT vs. GPT3: The Ultimate Comparison
- Prompt Engineering 101: Introduction and resources
- Transformer models: an introduction and catalog — 2022 Edition
- Can GPT-3 or BERT Ever Understand Language?—The Limits of Deep Learning Language Models
- 10 Things You Need to Know About BERT and the Transformer Architecture That Are Reshaping the AI Landscape
- Comprehensive Guide to Transformers
- Unmasking BERT: The Key to Transformer Model Performance
- Transformer NLP Models (Meena and LaMDA): Are They “Sentient” and What Does It Mean for Open-Domain Chatbots?
- Hugging Face Pre-trained Models: Find the Best One for Your Task
- Large Transformer Model Inference Optimization
- 4-part tutorial on how transformers work: Part 1 | Part 2 | Part 3 | Part 4
- What Makes a Dialog Agent Useful?
- Understanding Large Language Models -- A Transformative Reading List
- Building a search engine with a pre-trained BERT model
- Fine tuning pre-trained BERT model on Text Classification Task
- Fine tuning pre-trained BERT model on the Amazon product review dataset
- Sentiment analysis with Hugging Face transformer
- Fine tuning pre-trained BERT model on YELP review Classification Task
- HuggingFace API
- HuggingFace mask filling
- HuggingFace NER name entity recognition
- HuggingFace question answering within context
- HuggingFace text generation
- HuggingFace text summarisation.ipynb
- HuggingFace zero-shot learning
- Two notebooks are available:
- One with coloured boxes and outside folder
GitHub_MD_rendering
- One in black-and-white under folder
GitHub_MD_rendering
- One with coloured boxes and outside folder
- The easiest option would be for you to clone this repository.
- Navigate to Google Colab and open the notebook directly from Colab.
- You can then also write it back to GitHub provided permission to Colab is granted. The whole procedure is automated.
- How to Code BERT Using PyTorch
- miniGPT in PyTorch
- nanoGPT in PyTorch
- TensorFlow implementation of Attention is all you need + article