DicardoX

Ph.D. Candidate@sjtu-epcc. Previous intern @Microsoft, Shanghai. UG@CSE, SJTU. Research Interest: ML System, System for AI, LLM Training/Finetuning.

Shanghai Jiao Tong University

DicardoX's Stars

huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python134k 1.1k 15.9k26.7k
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Language:C++22.2k 717 18.3k5.6k
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Language:Python14k 284 2.1k1.8k
numba/numba
NumPy aware dynamic Python compiler using LLVM
Language:Python9.9k 199 5.2k1.1k
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Language:Python6.7k 71 1.8k493
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.6k 56 569443
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.5k 107 1.1k940
dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Language:Python2.6k 12 173270
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python2.5k 23 181198
cuda-mode/lectures
Material for cuda-mode lectures
Language:Jupyter Notebook2.5k 35 7252
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python2.4k 24 174183
pytorch/extension-cpp
C++ extensions in PyTorch
Language:Python1k 35 77212
microsoft/MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Language:Python741 6 5531
chrischoy/pytorch-custom-cuda-tutorial
Tutorial for building a custom CUDA function for Pytorch
Language:Python515 13 452
jxhe/unify-parameter-efficient-tuning
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)
Language:Python515 7 1644
hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language:Python454 5 726
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook321 4 3935
TUDB-Labs/mLoRA
An Efficient "Factory" to Build Multiple LoRA Adapters
Language:Python263 3 4847
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
Language:Python210 6 1727
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++202 7 2415
microsoft/ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Language:Python105 5 44
eth-easl/orion
An interference-aware scheduler for fine-grained GPU sharing
Language:Python97 2 1715
TUDB-Labs/MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
Language:Python78 3 79
dguo98/DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
Language:Python72 2 59
microsoft/SuperScaler
An experimental parallel training platform
48 8 311
tgale96/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
Language:Cuda46 2 937
microsoft/chunk-attention
Language:Python36 7 26
alibaba/alibaba-lingjun-dataset-2023
28 2 02
awslabs/optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
Language:Python13 2 11
bertmaher/tf32_gemm
Example of binding a TF32 CUTLASS GEMM kernel to PyTorch
Language:Python51

DicardoX

DicardoX's Stars

huggingface/transformers

PaddlePaddle/Paddle

microsoft/nni

numba/numba

skypilot-org/skypilot

sgl-project/sglang

NVIDIA/cutlass

dvlab-research/LongLoRA

ModelTC/lightllm

cuda-mode/lectures

mit-han-lab/llm-awq

pytorch/extension-cpp

microsoft/MInference

chrischoy/pytorch-custom-cuda-tutorial

jxhe/unify-parameter-efficient-tuning

hpcaitech/SwiftInfer

LLMServe/DistServe

TUDB-Labs/mLoRA

microsoft/sarathi-serve

bytedance/flux

microsoft/ParrotServe

eth-easl/orion

TUDB-Labs/MixLoRA

dguo98/DiffPruning

microsoft/SuperScaler

tgale96/grouped_gemm

microsoft/chunk-attention

alibaba/alibaba-lingjun-dataset-2023

awslabs/optimizing-multitask-training-through-dynamic-pipelines

bertmaher/tf32_gemm