LittleQili

PhD student at CSE, SJTU.

Shanghai Jiao Tong UniversityShanghai, China

LittleQili's Stars

huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python133k 1.1k 15.9k26.5k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda23.7k 232 1382.6k
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python16k 108 1k1.6k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python10.4k 68 105669
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python8.1k 139 3.7k1.5k
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog7k 68 23522
apple/corenet
CoreNet: A library for training deep neural networks
Language:Python6.9k 63 21540
conda-forge/miniforge
A conda-forge distribution.
Language:Shell6.2k 55 368323
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python4.3k 35 1.4k390
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python3.1k 35 71159
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.5k 25 2658
AliyunContainerService/gpushare-scheduler-extender
GPU Sharing Scheduler for Kubernetes Cluster
Language:Go1.4k 39 150309
Azure/AzurePublicDataset
Microsoft Azure Traces
Language:Jupyter Notebook789 37 35142
iamhyc/Overleaf-Workshop
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
Language:TypeScript475 3 9610
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python361 15 6429
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook296 4 3932
ROCm/rccl
ROCm Communication Collectives Library (RCCL)
Language:C++254 32 92113
microsoft/varuna
Language:Python234 8 1130
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++199 7 2113
domzilla/Caffeine
Caffeine for macOS 11+
Language:Swift147 1 010
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Language:C++134 4 539
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Language:Python95 2 1715
Hsword/SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
92 2 38
eniac/paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
Language:C++55 4 15
Tractables/pyjuice
Scalable training and inference for Probabilistic Circuits
Language:Python45 6 118
ROCm/rccl-tests
RCCL Performance Benchmark Tests
Language:Cuda42 10 837
aichipdesign/chipgptft
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework (DAC 2024)
Language:Python19 2 14
TiledTensor/TiledKernel
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
Language:C++18 2 02
uchuhimo/amanda
Language:Python14 3 01
Ash-Zheng/RAP-artifacts
Language:C++2 1 00

LittleQili

LittleQili's Stars

huggingface/transformers

karpathy/llm.c

huggingface/peft

microsoft/LoRA

triton-inference-server/server

adam-maj/tiny-gpu

apple/corenet

conda-forge/miniforge

InternLM/lmdeploy

linkedin/Liger-Kernel

HazyResearch/ThunderKittens

AliyunContainerService/gpushare-scheduler-extender

Azure/AzurePublicDataset

iamhyc/Overleaf-Workshop

microsoft/BitBLAS

LLMServe/DistServe

ROCm/rccl

microsoft/varuna

bytedance/flux

domzilla/Caffeine

TiledTensor/TiledCUDA

eth-easl/orion

Hsword/SpotServe

eniac/paella

Tractables/pyjuice

ROCm/rccl-tests

aichipdesign/chipgptft

TiledTensor/TiledKernel

uchuhimo/amanda

Ash-Zheng/RAP-artifacts