/fgml

Acceleration library for Machine Learning, especially for large language models

Apache License 2.0Apache-2.0

fmgl

Acceleration library for Machine Learning, especially for large language models.

  • Uniform quantization of LLama2 model, without block grouping.
  • Uniform quantization of Llama2 model, support 64 * 64 block grouping.
  • Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
  • Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.