tikikun/nitro-tensorrt-llm-personal-mirror
Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan
C++Apache-2.0
No issues in this repository yet.
Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan
C++Apache-2.0
No issues in this repository yet.