tensorrt-llm

There are 27 repositories under tensorrt-llm topic.

xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language:Python4.5k 132 8306
collabora/WhisperLive
A nearly-live implementation of OpenAI's Whisper.
Language:Python3.4k 41 241463
shashikg/WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Language:Jupyter Notebook468 18 7161
coderonion/awesome-cuda-and-hpc
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
321 5 035
huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Language:Python315 4 8859
npuichigo/openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
Language:Rust204 8 2228
NetEase-Media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
Language:C++165 9 313
NetEase-Media/grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
Language:Python153 4 1510
openhackathons-org/End-to-End-LLM
This repository is an AI Bootcamp material that consist of a workflow for LLM
Language:Jupyter Notebook84 8 1434
vossr/Chat-With-RTX-python-api
Chat With RTX Python API
Language:Python66 4 811
guidance-ai/llgtrt
TensorRT-LLM server with Structured Outputs (JSON) built with Rust
Language:Rust58 7 911
fgblanch/OutlookLLM
Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM
Language:Python43 4 24
menloresearch/cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
Language:C++42 2 203
argonne-lcf/LLM-Inference-Bench
LLM-Inference-Bench
Language:Jupyter Notebook39 9 14
CactusQ/TensorRT-LLM-Tutorial
Getting started with TensorRT-LLM using BLOOM as a case study
Language:Jupyter Notebook21 1 03
lix19937/llm-deploy
AI Infra LLM infer/ tensorrt-llm/ vllm
Language:Python20 1 00
zRzRzRzRzRzRzR/lm-fly
大模型推理框架加速，让 LLM 飞起来
Language:Python19 3 04
EdVince/whisper-trtllm
Whisper in TensorRT-LLM
Language:C++15 3 02
Delxrius/MiniMax-01
MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.
4 1 00
j3soon/LLM-Tutorial
LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.
Language:Jupyter Notebook2 1 11
ccyrene/flash_whisper
Whisper optimization for real-time application
Language:Python1 1 11
MustaphaU/Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM
A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.
Language:Python0 1 00
nyunAI/TensorRT-LLM
Language:C++0 0 00
Rahman2001/nim-factory
This project is a factory for NVIDIA NIM containers in which users/businesses can quantize many models and build their own TensorRT-LLM engine for optimized inference.
Language:Jupyter Notebook0 1 00
YconquestY/cc
Summary of call graphs and data structures of collective communication plugin in NVIDIA TensorRT-LLM
Language:D20 1 00
yui-mhcp/language_models
A Large Language Models (LLM) oriented project providing easy-to-use features like RAG, translation, summarization, ...
Language:Python0 1 00
cyanff/nyxt
Language:TypeScript2 0

tensorrt-llm

xlite-dev/Awesome-LLM-Inference

collabora/WhisperLive

shashikg/WhisperS2T

coderonion/awesome-cuda-and-hpc

huggingface/optimum-benchmark

npuichigo/openai_trtllm

NetEase-Media/grps

NetEase-Media/grps_trtllm

openhackathons-org/End-to-End-LLM

vossr/Chat-With-RTX-python-api

guidance-ai/llgtrt

fgblanch/OutlookLLM

menloresearch/cortex.tensorrt-llm

argonne-lcf/LLM-Inference-Bench

CactusQ/TensorRT-LLM-Tutorial

lix19937/llm-deploy

zRzRzRzRzRzRzR/lm-fly

EdVince/whisper-trtllm

Delxrius/MiniMax-01

j3soon/LLM-Tutorial

ccyrene/flash_whisper

MustaphaU/Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM

nyunAI/TensorRT-LLM

Rahman2001/nim-factory

YconquestY/cc

yui-mhcp/language_models

cyanff/nyxt