maodoudou168's Stars
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
TheLastBen/fast-stable-diffusion
fast-stable-diffusion + DreamBooth
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
mli/paper-reading
深度学习经典、新论文逐段精读
ai-forever/Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
nbasyl/LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
pytorch/PiPPy
Pipeline Parallelism for PyTorch
Lisennlp/distributed_train_pytorch
pytorch分布式训练,支持多机多卡,单机多卡。
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
microsoft/DeepSpeedExamples
Example models using DeepSpeed
moon-hotel/BertWithPretrained
An implementation of the BERT model and its related downstream tasks based on the PyTorch framework
qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
dafish-ai/NTU-Machine-learning
**大学李宏毅老师机器学习
lcdevelop/MachineLearningCourse
机器学习精简入门教程
dafish-ai/MachineLearning-GaoYang
主要是高扬《白话机器学习》的相关内容,作业以及推荐资料汇总
triton-inference-server/backend
Common source, scripts and utilities for creating Triton backends.
cog-isa/deep-path