post-training-quantization

There are 40 repositories under post-training-quantization topic.

  • intel/neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

    Language:Python2.3k33211259
  • 666DZY666/micronet

    micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

    Language:Python2.2k41110478
  • alibaba/TinyNeuralNetwork

    TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

    Language:Python77421148117
  • SqueezeAILab/SqueezeLLM

    [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

    Language:Python667182843
  • ModelTC/llmc

    [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

    Language:Python375104743
  • Xiuyu-Li/q-diffusion

    [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

    Language:Python338174122
  • megvii-research/Sparsebit

    A model compression and acceleration toolbox based on pytorch.

    Language:Python328123040
  • megvii-research/FQ-ViT

    [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

    Language:Python31554648
  • sayakpaul/Adventures-in-TensorFlow-Lite

    This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

    Language:Jupyter Notebook17212634
  • Hsu1023/DuQuant

    [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

    Language:Python1362169
  • hkproj/quantization-notes

    Notes on quantization in neural networks

    Language:Jupyter Notebook622115
  • ModelTC/TFMQ-DM

    [CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

    Language:Jupyter Notebook598114
  • Sanjana7395/static_quantization

    Post-training static quantization using ResNet18 architecture

    Language:Jupyter Notebook37017
  • ModelTC/QLLM

    [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

    Language:Python34813
  • zysxmu/FDDA

    Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization

    Language:Python14101
  • iszry/DI2N-PTQ4DM

    Improved the performance of 8-bit PTQ4DM expecially on FID.

    Language:Python9100
  • shieldforever/NeuronQuant

    [ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation

    Language:Python8201
  • GongCheng1919/bias-compensation

    [CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

    Language:Python7121
  • motokimura/pytorch_quantization_fx

    An example to quantize MobileNetV2 trained on CIFAR-10 dataset with PyTorch FX graph mode quantization

    Language:Python6203
  • Rumeysakeskin/ASR-Quantization

    Post-training quantization on Nvidia Nemo ASR model

    Language:Jupyter Notebook610
  • likholat/openvino_quantization

    This sample shows how to convert TensorFlow model to OpenVINO IR model and how to quantize OpenVINO model.

    Language:Python5350
  • yester31/Quantization_EX

    quantization example for pqt & qat

    Language:Python5102
  • satya15july/quantization

    Model Quantization with Pytorch, Tensorflow & Larq

    Language:C++4201
  • smpanaro/norm-tweaking

    Post post-training-quantization (PTQ) method for improving LLMs. Unofficial implementation of https://arxiv.org/abs/2309.02784

    Language:Python4310
  • ssi-research/eptq

    Implementation of EPTQ - an Enhanced Post-Training Quantization algorithm for DNN compression

    Language:Python4300
  • yester31/TensorRT_ONNX

    Generating tensorrt model using onnx

    Language:C++4101
  • AndreiZoltan/ptq_resnet20

    Low-bit (2/4/8/16) Post Training Quantization for ResNet20

    Language:Python2100
  • Gaurav-Van/Fine-Tuning-LLMs

    Introductory Guide where we will talk about Different Techniques of Fine Tuning LLMs

    Language:Jupyter Notebook2101
  • generalMG/Medical-Dataset-Deep-Learning-Quantization-Data-Analysis

    The repository discusses a research work published on MDPI Sensors and provides details about the project.

    Language:Python2100
  • TanyaChutani/Quantization_Tensorflow

    Quantization for Object Detection in Tensorflow 2.x

    Language:Python220
  • yashmaniya0/Quantization-of-Image-Classification-Models

    Comprehensive study on the quantization of various CNN models, employing techniques such as Post-Training Quantization and Quantization Aware Training (QAT).

    Language:Jupyter Notebook2100
  • OmidGhadami95/EfficientNetV2_Quantization_CK

    EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.

    Language:Jupyter Notebook1100
  • yester31/TensorRT_Examples

    All useful sample codes of tensorrt models using onnx

    Language:Python1201
  • Inpyo-Hong/Model-Compression-Paper-List

    Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)

  • rioter1/embeddedAI

    Creating Snapdragon neural processing engine environment to convert protobuf(pb) files to dlc(deep learning container) files as well as use snpe for Quantizing neural networks. Mobile app development link https://github.com/anshumax/mobilenn

    Language:Python0200