maodoudou168's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
mli/paper-reading
深度学习经典、新论文逐段精读
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
TheLastBen/fast-stable-diffusion
fast-stable-diffusion + DreamBooth
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
microsoft/DeepSpeedExamples
Example models using DeepSpeed
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
ai-forever/Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
dafish-ai/NTU-Machine-learning
**大学李宏毅老师机器学习
lcdevelop/MachineLearningCourse
机器学习精简入门教程
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
pytorch/PiPPy
Pipeline Parallelism for PyTorch
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
moon-hotel/BertWithPretrained
An implementation of the BERT model and its related downstream tasks based on the PyTorch framework
3DAgentWorld/Toolkit-for-Prompt-Compression
Toolkit for Prompt Compression
nbasyl/LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
Lisennlp/distributed_train_pytorch
pytorch分布式训练,支持多机多卡,单机多卡。
zbwxp/Dynamic-Token-Pruning
Official Pytorch implementation of Dynamic-Token-Pruning (ICCV2023)