Pinned Repositories
kubernetes
Production-Grade Container Scheduling and Management
convert2onnx
LLM2Onnx
headscale
An open source, self-hosted implementation of the Tailscale control server
jobrunner
Framework for performing work asynchronously, outside of the request flow
juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
tailscale
The easiest, most secure way to use WireGuard and 2FA.
TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
tensorrtllm_backend
The Triton TensorRT-LLM Backend
shuhaosong's Repositories
shuhaosong/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
shuhaosong/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
shuhaosong/tailscale
The easiest, most secure way to use WireGuard and 2FA.
shuhaosong/headscale
An open source, self-hosted implementation of the Tailscale control server
shuhaosong/tensorrtllm_backend
The Triton TensorRT-LLM Backend
shuhaosong/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
shuhaosong/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
shuhaosong/convert2onnx
LLM2Onnx
shuhaosong/TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
shuhaosong/kubernetes
Production-Grade Container Scheduling and Management
shuhaosong/jobrunner
Framework for performing work asynchronously, outside of the request flow