Pinned Repositories
llama.cpp
Port of Facebook's LLaMA model in C/C++
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
llama.cpp
LLM inference in C/C++
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
JidongZhang-THU's Repositories
JidongZhang-THU/llama.cpp
Port of Facebook's LLaMA model in C/C++
JidongZhang-THU/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
JidongZhang-THU/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
JidongZhang-THU/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.