fmgl

Acceleration library for Machine Learning, especially for large language models.

Uniform quantization of LLama2 model, without block grouping.
Uniform quantization of Llama2 model, support 64 * 64 block grouping.
Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.

elphinkuo/fgml