JidongZhang-THU

Pinned Repositories

llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C++00
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
00
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:C++00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
llama.cpp
LLM inference in C/C++
Language:C++69.9k10.1k
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python5k449

JidongZhang-THU's Repositories

JidongZhang-THU/llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C++
JidongZhang-THU/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
JidongZhang-THU/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
JidongZhang-THU/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.