Somoku's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
meta-llama/llama3
The official Meta Llama 3 GitHub site
richards199999/Thinking-Claude
Let your Claude able to think
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
federico-busato/Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
sksq96/pytorch-summary
Model summary in PyTorch similar to `model.summary()` in Keras
gpu-mode/lectures
Material for gpu-mode lectures
dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
adapter-hub/adapters
A Unified Library for Parameter-Efficient and Modular Transfer Learning
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
stanfordnlp/pyreft
ReFT: Representation Finetuning for Language Models
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
linux-rdma/perftest
Infiniband Verbs Performance Tests
AmadeusChan/Awesome-LLM-System-Papers
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
S-Lab-System-Group/Awesome-DL-Scheduling-Papers
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
THUDM/LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
imoneoi/multipack_sampler
Multipack distributed sampler for fast padding-free training of LLMs
Outsider565/LoRA-GA
git-cloner/llama-lora-fine-tuning
llama fine-tuning with lora
Mellanox/gpu_direct_rdma_access
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
OpenNLPLab/LASP
Linear Attention Sequence Parallelism (LASP)
PKU-DAIR/Hetu-Galvatron
Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).
LLMServe/dLoRA-artifact