Pinned Repositories
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
calamari
Web-based monitoring and management for Ceph
ceph
Ceph is a distributed object, block, and file storage platform
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
kubernetes
Production-Grade Container Scheduling and Management
LLM-Inference
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
chizhang118's Repositories
chizhang118/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
chizhang118/calamari
Web-based monitoring and management for Ceph
chizhang118/ceph
Ceph is a distributed object, block, and file storage platform
chizhang118/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
chizhang118/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
chizhang118/kubernetes
Production-Grade Container Scheduling and Management
chizhang118/LLM-Inference
chizhang118/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
chizhang118/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
chizhang118/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
chizhang118/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs