jeejeelee

SobeyMILChengdu, China

Pinned Repositories

bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python0 0 00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda0 0 00
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python00
punica_triton_kernel
Language:Python4 1 00
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python00
triton
Development repository for the Triton language and compiler
Language:C++0 0 00
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python6 0 00
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.2k 1.7k 47.8k23k
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python32.4k 264 5.7k4.9k

jeejeelee's Repositories

jeejeelee/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python6 0 00
jeejeelee/punica_triton_kernel
Language:Python4 1 00
jeejeelee/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python0 0 00
jeejeelee/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
jeejeelee/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda0 0 00
jeejeelee/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python00
jeejeelee/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python00
jeejeelee/triton
Development repository for the Triton language and compiler
Language:C++0 0 00