- Transformers: https://huggingface.co/docs/transformers/index
- Llama: https://github.com/facebookresearch/llama
- Mistral: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- GPTBigCode: https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#gptbigcode
- Phi: https://huggingface.co/microsoft/phi-2
- GPT2: https://huggingface.co/docs/transformers/model_doc/gpt2
- Multi-Query Attention (MQA): https://blog.fireworks.ai/multi-query-attention-is-all-you-need-db072e758055
- Grouped-Query Attention (GQA)
- Sliding-Window Attention (SWA)
- tokenizers: https://huggingface.co/docs/transformers/tokenizer_summary
- Byte-fallback BPE tokenizer
- GPT-NeoX: https://github.com/EleutherAI/gpt-neox
- causal language modeling (CLM)