tikikun/nitro-tensorrt-llm-personal-mirror

Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan

C++Apache-2.0

No issues in this repository yet.