gptq
There are 25 repositories under gptq topic.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
shm007g/LLaMA-Cult-and-More
Large Language Models for All, 🦙 Cult and More, Stay in touch !
bobazooba/xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
1b5d/llm-api
Run any Large Language Model behind a unified API
chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
abhinand5/gptq_for_langchain
A guide about how to use GPTQ models with langchain
ziwang-com/zero-lora
zero零训练llm调参
taishan1994/LLM-Quantization
记录量化LLM中的总结。
hcd233/Aris-AI-Model-Server
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
seyf1elislam/LocalLLM_OneClick_Colab
Run gguf LLM models in Latest Version TextGen-webui and koboldcpp
tripathiarpan20/self-improvement-4all
Private self-improvement coaching with open-source LLMs
chinoll/chatsakura
ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)
Aqirito/A.L.I.C.E
A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
matlok-ai/bampe-weights
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
bobazooba/shurale
Conversation AI model for open domain dialogs
SujanNeupane42/NEPSE-Chatbot-Using-Retrieval-augmented-generation-and-reranking
This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
upunaprosk/quantized-lm-confidence
Code for NAACL paper When Quantization Affects Confidence of Large Language Models?
amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF
LLM quantization techniques: absmax, zero-point, GPTQ and GGUF
lpalbou/model-quantizer
Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
STiFLeR7/Edge-LLM
Optimized Qwen2.5-3B using GPTQ, reducing size from 5.75GB → 1.93GB and improving inference speed. Ideal for efficient edge AI deployments.
ElDokmak/LLMs-variety
Hands on some LLMs
SJD1882/LLMCheatSheet
Personal GitHub repository for stashing resources on Large Language Models (LLM), including Jupyter Notebooks on open source LLMs, use-cases with Langchain and R&D paper review.
SujanNeupane42/LLM_Quantization
Quantizing LLMs using GPTQ
rightpunchChen/edgeAI_final_report
Llama-3.2-3B-Instruct LoRA + GPTQ Compression & Inference with vLLM