Pinned Repositories
QLLM
A general x bits quantization toolbox for LLMs, 2-8 bits support and quantization with GPTQ/AWQ easily.
flash-attention
Fast and memory-efficient exact attention
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
onnxruntime-genai
Generative AI extensions for onnxruntime
QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
aciddelgado's Repositories
aciddelgado/QLLM
A general x bits quantization toolbox for LLMs, 2-8 bits support and quantization with GPTQ/AWQ easily.