0xMatthew's Stars
openai/openai-python
The official Python library for the OpenAI API
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
NVIDIA/ChatRTX
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.