WoosukKwon
CS PhD student at UC Berkeley, building @vllm-project
University of California, BerkeleyBerkeley, CA
Pinned Repositories
alpa
Training and serving large-scale neural networks with auto parallelization.
flashinfer
FlashInfer: Kernel Library for LLM Serving
Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
sky-llama
skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
nimble
Lightweight and Parallel Deep Learning Framework
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
torch-xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
WoosukKwon's Repositories
WoosukKwon/retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
WoosukKwon/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
WoosukKwon/torch-xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)