Simple and fast server for GPTQ-quantized LLaMA inference
Primary LanguagePython
No issues in this repository yet.