An endpoint server for efficiently serving quantized open-source LLMs for code.
Primary LanguagePythonApache License 2.0Apache-2.0