WoosukKwon

CS PhD student at UC Berkeley, building @vllm-project

University of California, BerkeleyBerkeley, CA

Pinned Repositories

alpa
Training and serving large-scale neural networks with auto parallelization.
Language:Python3k 45 296346
flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda745 13 6260
Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
Language:Python1.1k 26 404144
sky-llama
Language:Python26 6 05
skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
Language:Python6.2k 68 1.6k424
nimble
Lightweight and Parallel Deep Learning Framework
Language:C++258 9 2333
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python21.8k 196 3.2k3.1k
retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
Language:Python150 5 1624
torch-xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Language:C++10
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python6 0 00

WoosukKwon's Repositories

WoosukKwon/retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
Language:Python150 5 1624
WoosukKwon/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python6 0 00
WoosukKwon/torch-xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Language:C++10