DicardoX
Ph.D. Candidate@sjtu-epcc. Previous intern @Microsoft, Shanghai. UG@CSE, SJTU. Research Interest: ML System, System for AI, LLM Training/Finetuning.
Shanghai Jiao Tong University
DicardoX's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
numba/numba
NumPy aware dynamic Python compiler using LLVM
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
cuda-mode/lectures
Material for cuda-mode lectures
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
pytorch/extension-cpp
C++ extensions in PyTorch
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
chrischoy/pytorch-custom-cuda-tutorial
Tutorial for building a custom CUDA function for Pytorch
jxhe/unify-parameter-efficient-tuning
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)
hpcaitech/SwiftInfer
Efficient AI Inference & Serving
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
TUDB-Labs/mLoRA
An Efficient "Factory" to Build Multiple LoRA Adapters
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
microsoft/ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
TUDB-Labs/MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
dguo98/DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
microsoft/SuperScaler
An experimental parallel training platform
tgale96/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
microsoft/chunk-attention
alibaba/alibaba-lingjun-dataset-2023
awslabs/optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
bertmaher/tf32_gemm
Example of binding a TF32 CUTLASS GEMM kernel to PyTorch