LittleQili's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
karpathy/llm.c
LLM training in simple, raw C/CUDA
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
apple/corenet
CoreNet: A library for training deep neural networks
conda-forge/miniforge
A conda-forge distribution.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
AliyunContainerService/gpushare-scheduler-extender
GPU Sharing Scheduler for Kubernetes Cluster
Azure/AzurePublicDataset
Microsoft Azure Traces
iamhyc/Overleaf-Workshop
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
ROCm/rccl
ROCm Communication Collectives Library (RCCL)
microsoft/varuna
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
domzilla/Caffeine
Caffeine for macOS 11+
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Hsword/SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
eniac/paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
Tractables/pyjuice
Scalable training and inference for Probabilistic Circuits
ROCm/rccl-tests
RCCL Performance Benchmark Tests
aichipdesign/chipgptft
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework (DAC 2024)
TiledTensor/TiledKernel
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
uchuhimo/amanda
Ash-Zheng/RAP-artifacts