model-quantization

There are 29 repositories under model-quantization topic.

  • Efficient-ML/Awesome-Model-Quantization

    A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

  • horseee/Awesome-Efficient-LLM

    A curated list for Efficient Large Language Models

    Language:Python1.9k425146
  • datawhalechina/awesome-compression

    模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases

  • inferflow/inferflow

    Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

    Language:C++24881725
  • Efficient-ML/Awesome-Efficient-AIGC

    A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

  • sayakpaul/Adventures-in-TensorFlow-Lite

    This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

    Language:Jupyter Notebook17410635
  • RodolfoFerro/psychopathology-fer-assistant

    [WINNER! 🏆] Psychopathology FER Assistant. Because mental health matters. My project submission for #TFWorld TF 2.0 Challenge at Devpost.

    Language:Jupyter Notebook776125
  • htqin/BiBench

    [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.

    Language:Python56245
  • htqin/QuantSR

    [NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.

    Language:Python50332
  • nbasyl/OFQ

    The official implementation of the ICML 2023 paper OFQ-ViT

    Language:Python33151
  • seonglae/llama2gptq

    Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

    Language:Python30180
  • HaoranREN/TensorFlow_Model_Quantization

    A tutorial of model quantization using TensorFlow

    Language:Python12103
  • wlfeng0509/Awesome-Diffusion-Quantization

    A list of papers, docs, codes about diffusion quantization.This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.

  • dcarpintero/ai-engineering

    AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.

    Language:Jupyter Notebook6100
  • frickyinn/BiDense

    PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.

    Language:Python6200
  • medoidai/model-quantization-blog-notebooks

    Notebook from "A Hands-On Walkthrough on Model Quantization" blog post.

    Language:Jupyter Notebook4100
  • Model-Quantization

    SRDdev/Model-Quantization

    Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).

    Language:Jupyter Notebook4100
  • BjornMelin/local-llm-workbench

    🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

    Language:Shell310
  • dwain-barnes/LLM-GGUF-Auto-Converter

    Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

    Language:Jupyter Notebook3202
  • first-coding/VIT

    This project distills a ViT model into a compact CNN, reducing its size to 1.24MB with minimal accuracy loss. ONNXRuntime with CUDA boosts inference speed, while FastAPI and Docker simplify deployment.

    Language:Python20
  • SIYAKS-ARES/survival-with-llms

    The Ark Project: Selecting the perfect AI model to reboot civilization from a 64GB USB drive. Comprehensive analysis of open-source LLMs under extreme constraints, with final recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF). Includes interactive tools, detailed comparisons, and complete implementation guide for offline deployment.

    Language:HTML1
  • dslisleedh/NCNet-flax

    Unofficial implementation of NCNet using flax and jax

    Language:Python0100
  • satyampurwar/large-language-models

    Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning, Reinforcement Learning Fine-Tuning, Retrieval Augmented Generation and LangGraph Workflows for AI Agents.

    Language:Jupyter Notebook0100
  • xhay-p/ttPG

    Torch and Transformers Playground: Learn and Code Deep Learning using PyTorch and HuggingFace Transformers.

    Language:Jupyter Notebook00
  • aashu-0/FineTuning_GPT2

    PyTorch implementation of GPT-2 that loads pretrained weights and enables instruction fine-tuning on the Stanford Alpaca dataset.

    Language:Jupyter Notebook
  • Chenguiti6444/Vehicle_Detection_and_Classification_using_Deep_Learning

    Fine-tuning Pretrained Deep Learning Models to Classify Low Quality Images of Land Vehicles. - Ajustement de modèles de deep learning préentraînés pour classifier des images faible qualité de véhicules terrestres.

    Language:Jupyter Notebook10
  • harshmorya/Assignment__HB1--1

    This project explores generating high-quality images using depth maps and conditioning techniques like Canny edges, leveraging Stable Diffusion and ControlNet models. It focuses on optimizing image generation with different aspect ratios, inference steps to balance speed and quality.

    Language:Python101
  • santidrj/model-quantization-aggregation

    Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

    Language:Python