Jeremy-J-J

Focus AI

HangZhou

Pinned Repositories

YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Language:Python4.9k 44 488470
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.6k 31 470490
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.8k 15 427224
llama.cpp
LLM inference in C/C++
Language:C++69.8k 558 4.2k10.1k
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python5k 40 1.6k448
PaperX
00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Language:Python325 7 11438
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++9k 97 2.1k1k
Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
Language:C++4.7k 231 609995

Jeremy-J-J's Repositories

Jeremy-J-J/PaperX
00
Jeremy-J-J/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00

Jeremy-J-J

Pinned Repositories

YOLO-World

AutoGPTQ

AutoAWQ

llama.cpp

lmdeploy

PaperX

TensorRT-LLM

evalscope

TensorRT-LLM

Tengine

Jeremy-J-J's Repositories

Jeremy-J-J/PaperX

Jeremy-J-J/TensorRT-LLM