JimmyCai91/LLM

LLM

Transformers: https://huggingface.co/docs/transformers/index

Architectures

Llama: https://github.com/facebookresearch/llama
Mistral: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
GPTBigCode: https://huggingface.co/docs/transformers/model_doc/gpt_bigcode#gptbigcode
Phi: https://huggingface.co/microsoft/phi-2
GPT2: https://huggingface.co/docs/transformers/model_doc/gpt2

Smaller Parts

Multi-Query Attention (MQA): https://blog.fireworks.ai/multi-query-attention-is-all-you-need-db072e758055
Grouped-Query Attention (GQA)
Sliding-Window Attention (SWA)
tokenizers: https://huggingface.co/docs/transformers/tokenizer_summary
Byte-fallback BPE tokenizer

Train

GPT-NeoX: https://github.com/EleutherAI/gpt-neox

Objectives

causal language modeling (CLM)