chizhang118

CMU MCDS graduate on distributed system. Ex-googler, work in Bytedance.

BytedanceSan Jose

Pinned Repositories

Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
00
calamari
Web-based monitoring and management for Ceph
Language:Python00
ceph
Ceph is a distributed object, block, and file storage platform
Language:C++00
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python00
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python00
kubernetes
Production-Grade Container Scheduling and Management
Language:Go00
LLM-Inference
00
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python00
SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
Language:Python00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00

chizhang118's Repositories

chizhang118/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
00
chizhang118/calamari
Web-based monitoring and management for Ceph
Language:Python00
chizhang118/ceph
Ceph is a distributed object, block, and file storage platform
Language:C++00
chizhang118/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python00
chizhang118/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python00
chizhang118/kubernetes
Production-Grade Container Scheduling and Management
Language:Go00
chizhang118/LLM-Inference
00
chizhang118/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python00
chizhang118/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
Language:Python00
chizhang118/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
chizhang118/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python

chizhang118

Pinned Repositories

Awesome-LLM-Inference

calamari

ceph

DeepSpeed

DeepSpeed-MII

kubernetes

LLM-Inference

pytorch

SWE-agent

TensorRT-LLM

chizhang118's Repositories

chizhang118/Awesome-LLM-Inference

chizhang118/calamari

chizhang118/ceph

chizhang118/DeepSpeed

chizhang118/DeepSpeed-MII

chizhang118/kubernetes

chizhang118/LLM-Inference

chizhang118/pytorch

chizhang118/SWE-agent

chizhang118/TensorRT-LLM

chizhang118/vllm