aciddelgado

Pinned Repositories

QLLM
A general x bits quantization toolbox for LLMs, 2-8 bits support and quantization with GPTQ/AWQ easily.
Language:Python0 0 00
flash-attention
Fast and memory-efficient exact attention
Language:Python14.2k 119 1.1k1.3k
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++348 8 4737
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++14.7k 246 6.7k2.9k
onnxruntime-genai
Generative AI extensions for onnxruntime
Language:C++512 43 285127
QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Language:Python148 9 1915

aciddelgado/QLLM
A general x bits quantization toolbox for LLMs, 2-8 bits support and quantization with GPTQ/AWQ easily.
Language:Python0 0 00