shuhaosong

shanghai

Pinned Repositories

kubernetes
Production-Grade Container Scheduling and Management
Language:Go10
convert2onnx
LLM2Onnx
00
headscale
An open source, self-hosted implementation of the Tailscale control server
Language:Go00
jobrunner
Framework for performing work asynchronously, outside of the request flow
Language:Go00
juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Language:Go00
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python00
tailscale
The easiest, most secure way to use WireGuard and 2FA.
Language:Go00
TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
Language:C++00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python00

shuhaosong/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
shuhaosong/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
shuhaosong/tailscale
The easiest, most secure way to use WireGuard and 2FA.
shuhaosong/headscale
An open source, self-hosted implementation of the Tailscale control server
shuhaosong/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python
shuhaosong/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
shuhaosong/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
shuhaosong/convert2onnx
LLM2Onnx
shuhaosong/TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
shuhaosong/kubernetes
Production-Grade Container Scheduling and Management
1
shuhaosong/jobrunner
Framework for performing work asynchronously, outside of the request flow