quantization

There are 734 repositories under quantization topic.

  • LLaMA-Factory

    hiyouga/LLaMA-Factory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Language:Python45.5k2506.3k5.6k
  • Chinese-LLaMA-Alpaca

    ymcui/Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

    Language:Python18.8k1837321.9k
  • SYSTRAN/faster-whisper

    Faster Whisper transcription with CTranslate2

    Language:Python15.1k1358161.3k
  • Qbot

    UFund-Me/Qbot

    [🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

    Language:Jupyter Notebook10.8k125851.6k
  • bitsandbytes

    bitsandbytes-foundation/bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch.

    Language:Python6.9k511.1k677
  • kornelski/pngquant

    Lossy PNG compressor — pngquant command based on libimagequant library

    Language:C5.3k129351490
  • AutoGPTQ/AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

    Language:Python4.8k30483512
  • IntelLabs/distiller

    Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

    Language:Jupyter Notebook4.4k130350804
  • OpenNMT/CTranslate2

    Fast inference engine for Transformer models

    Language:C++3.7k59744340
  • deepsparse

    neuralmagic/deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

    Language:Python3.1k55143181
  • huawei-noah/Pretrained-Language-Model

    Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

    Language:Python3.1k57201633
  • nlp-architect

    IntelLabs/nlp-architect

    A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

    Language:Python2.9k162126448
  • huggingface/optimum

    🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

    Language:Python2.8k54798516
  • aaron-xichen/pytorch-playground

    Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

    Language:Python2.7k5250614
  • stochasticai/xTuring

    Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

    Language:Python2.6k34102206
  • intel/neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

    Language:Python2.4k32214264
  • dvmazur/mixtral-offloading

    Run Mixtral-8x7B models in Colab or consumer desktops

    Language:Python2.3k2829232
  • quic/aimet

    AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

    Language:Python2.3k491.4k398
  • 666DZY666/micronet

    micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

    Language:Python2.2k40110476
  • Efficient-ML/Awesome-Model-Quantization

    A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

  • pytorch/ao

    PyTorch native quantization and sparsity for training and inference

    Language:Python1.9k43432234
  • intel/intel-extension-for-pytorch

    A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

    Language:Python1.8k36612263
  • OpenPPL/ppq

    PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

    Language:Python1.7k16240248
  • PaddlePaddle/PaddleSlim

    PaddleSlim is an open-source library for deep model compression and architecture search.

    Language:Python1.6k90560351
  • open-mmlab/mmrazor

    OpenMMLab Model Compression Toolbox and Benchmark.

    Language:Python1.6k21279235
  • tensorflow/model-optimization

    A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

    Language:Python1.5k117362325
  • RWKV/rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

    Language:C++1.5k2282102
  • Xilinx/brevitas

    Brevitas: neural network quantization in PyTorch

    Language:Python1.3k31475207
  • RahulSChand/gpu_poor

    Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

    Language:JavaScript1.3k71667
  • huawei-noah/Efficient-Computing

    Efficient computing methods developed by Huawei Noah's Ark Lab

    Language:Jupyter Notebook1.3k23138218
  • thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    Language:Cuda1.2k2513281
  • openvinotoolkit/training_extensions

    Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

    Language:Python1.2k47408447
  • vllm-project/llm-compressor

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Language:Python1.1k18205105
  • mit-han-lab/nunchaku

    [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

    Language:Cuda1k1819568
  • openvinotoolkit/nncf

    Neural Network Compression Framework for enhanced OpenVINO™ inference

    Language:Python98930349249
  • adithya-s-k/AI-Engineering.academy

    Mastering Applied AI, One Concept at a Time

    Language:Jupyter Notebook942113106