Pinned Repositories
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
cutlass
CUDA Templates for Linear Algebra Subroutines
flashinfer
FlashInfer: Kernel Library for LLM Serving
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
punica_triton_kernel
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
triton
Development repository for the Triton language and compiler
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
jeejeelee's Repositories
jeejeelee/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
jeejeelee/punica_triton_kernel
jeejeelee/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
jeejeelee/cutlass
CUDA Templates for Linear Algebra Subroutines
jeejeelee/flashinfer
FlashInfer: Kernel Library for LLM Serving
jeejeelee/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
jeejeelee/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
jeejeelee/triton
Development repository for the Triton language and compiler