quantization

There are 734 repositories under quantization topic.

hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language:Python45.5k 250 6.3k5.6k
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Language:Python18.8k 183 7321.9k
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python15.1k 135 8161.3k
UFund-Me/Qbot
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
Language:Jupyter Notebook10.8k 125 851.6k
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python6.9k 51 1.1k677
kornelski/pngquant
Lossy PNG compressor — pngquant command based on libimagequant library
Language:C5.3k 129 351490
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.8k 30 483512
IntelLabs/distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Language:Jupyter Notebook4.4k 130 350804
OpenNMT/CTranslate2
Fast inference engine for Transformer models
Language:C++3.7k 59 744340
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language:Python3.1k 55 143181
huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language:Python3.1k 57 201633
IntelLabs/nlp-architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Language:Python2.9k 162 126448
huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Language:Python2.8k 54 798516
aaron-xichen/pytorch-playground
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
Language:Python2.7k 52 50614
stochasticai/xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
Language:Python2.6k 34 102206
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.4k 32 214264
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
Language:Python2.3k 28 29232
quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Language:Python2.3k 49 1.4k398
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language:Python2.2k 40 110476
Efficient-ML/Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
2k 64 13221
pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language:Python1.9k 43 432234
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language:Python1.8k 36 612263
OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Language:Python1.7k 16 240248
PaddlePaddle/PaddleSlim
PaddleSlim is an open-source library for deep model compression and architecture search.
Language:Python1.6k 90 560351
open-mmlab/mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Language:Python1.6k 21 279235
tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Language:Python1.5k 117 362325
RWKV/rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Language:C++1.5k 22 82102
Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
Language:Python1.3k 31 475207
RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Language:JavaScript1.3k 7 1667
huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language:Jupyter Notebook1.3k 23 138218
thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language:Cuda1.2k 25 13281
openvinotoolkit/training_extensions
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Language:Python1.2k 47 408447
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python1.1k 18 205105
mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language:Cuda1k 18 19568
openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
Language:Python989 30 349249
adithya-s-k/AI-Engineering.academy
Mastering Applied AI, One Concept at a Time
Language:Jupyter Notebook942 11 3106

quantization

hiyouga/LLaMA-Factory

ymcui/Chinese-LLaMA-Alpaca

SYSTRAN/faster-whisper

UFund-Me/Qbot

bitsandbytes-foundation/bitsandbytes

kornelski/pngquant

AutoGPTQ/AutoGPTQ

IntelLabs/distiller

OpenNMT/CTranslate2

neuralmagic/deepsparse

huawei-noah/Pretrained-Language-Model

IntelLabs/nlp-architect

huggingface/optimum

aaron-xichen/pytorch-playground

stochasticai/xTuring

intel/neural-compressor

dvmazur/mixtral-offloading

quic/aimet

666DZY666/micronet

Efficient-ML/Awesome-Model-Quantization

pytorch/ao

intel/intel-extension-for-pytorch

OpenPPL/ppq

PaddlePaddle/PaddleSlim

open-mmlab/mmrazor

tensorflow/model-optimization

RWKV/rwkv.cpp

Xilinx/brevitas

RahulSChand/gpu_poor

huawei-noah/Efficient-Computing

thu-ml/SageAttention

openvinotoolkit/training_extensions

vllm-project/llm-compressor

mit-han-lab/nunchaku

openvinotoolkit/nncf

adithya-s-k/AI-Engineering.academy