int4
There are 5 repositories under int4 topic.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Danaozhong/rust-bitwriter
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.