NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
C++Apache-2.0
Stargazers
- 05jd
- byshiueNVIDIA
- caohaox
- fc500110baidu.com
- Funatiq@JGU-HPC @NVIDIA
- GeeeekExplorerNanjing University
- georgeliu95NVIDIA
- GHGmc2@intel
- GitHub30Osaka, Japan
- hanlintMosaicML
- kaiyuxNVIDIA
- kir152
- lfr-0531CASIA
- litaotju@NVIDIA
- michaelroyzenSan Francisco
- mickaelseznec
- mlmonk@botitai
- ncomly-nvidia@NVIDIA
- nekorobov
- nvtrt
- PerkzZhengNvidia-Beijing
- pirrohReplit
- reshinthadithyanPondicherry
- rosenrodt
- SeanNaren@NVIDIA
- shangz-ai@Nvidia
- Shixiaowei02
- tuhinsSan Francisco, CA
- tuttlebrNVIDIA
- usametovAsta Nova Enterprise Solutions
- Victarry@NVIDIA
- vilsonrodrigues
- willg1996Shanghai, CN
- wm2012011492
- xuanzic
- yanring@NVIDIA