A curated list of awesome transformer models.
If you want to contribute to this list, send a pull request or reach out to me on twitter: @abacaj. Let's make this list useful.
There are a number of models available that are not entirely open source (non-commercial, etc), this repository should serve to also make you aware of that. Tracking the original source/company of the model will help.
I would also eventually like to add model use cases. So it is easier for others to find the right one to fine-tune.
Format:
- Model name: short description, usually from paper
- Model link (usually huggingface or github)
- Paper link
- Source as company or group
- Model license
- Encoder (autoencoder) models
- Decoder (autoregressive) models
- Encoder+decoder (seq2seq) models
- Multimodal models
- Vision models
- ALBERT: "A Lite" version of BERT
- BERT: Bidirectional Encoder Representations from Transformers
- DistilBERT: Distilled version of BERT smaller, faster, cheaper and lighter
- Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
- RoBERTa: Robustly Optimized BERT Pretraining Approach
- BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
- LLaMa: Open and Efficient Foundation Language Models
- Model
- Paper
- Requires approval, non-commercial
- GPT: Bidirectional Encoder Representations from Transformers
- GPT-2: Distilled version of BERT smaller, faster, cheaper and lighter
- GPT-J: A 6 Billion Parameter Autoregressive Language Model
- GPT-NEO: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow
- GPT-NEOX-20B: An Open-Source Autoregressive Language Model
- NeMo Megatron-GPT: Megatron-GPT 20B is a transformer-based language model.
- OPT: Open Pre-trained Transformer Language Models
- Model
- Paper
- Requires approval, non-commercial
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Model
- Paper
- BigScience
- OpenRAIL, use-based restrictions
- GLM: An Open Bilingual Pre-Trained Model
- Model
- Paper
- Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University
- Custom license, see restrictions
- YaLM: Pretrained language model with 100B parameters
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- FLAN-T5: Scaling Instruction-Finetuned Language Models
- Code-T5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
- Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- MT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
- UL2: Unifying Language Learning Paradigms