int4
There are 7 repositories under int4 topic.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Danaozhong/rust-bitwriter
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.
ambv231/tinyllama-coreml-ios18-quantization
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
GreenBull31/tinyllama-coreml-ios18-quantization
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.