wxd000000's Stars
nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Jeevan-kumar-Raj/Grokking-System-Design
Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems theory to product development.
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
DLLXW/baby-llama2-chinese
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
ymcui/Chinese-LLaMA-Alpaca-3
中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
mlfoundations/dclm
DataComp for Language Models
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Jack47/hack-SysML
The road to hack SysML and become an system expert
hemingkx/SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️