int4

There are 7 repositories under int4 topic.

intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.5k 31 220281
intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
Language:Python704 17 25859
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++684 16 1394
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++349 7 4739
Danaozhong/rust-bitwriter
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.
Language:Rust3 1 00
ambv231/tinyllama-coreml-ios18-quantization
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Language:Python1
GreenBull31/tinyllama-coreml-ios18-quantization
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Language:Python