int4
There are 5 repositories under int4 topic.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
intel/auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Danaozhong/rust-bitwriter
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.