Pinned Repositories
Chimera
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.
Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
COMPI
Cache-oblivious MPI all-to-all communications based on Morton order
eager-SGD
Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.
SpMV-on-Many-Core
A cross-platform Sparse Matrix Vector Multiplication (SpMV) framework for many-core architectures (GPUs and Xeon Phi).
WAGMA-SGD
WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.
daceml
A Data-Centric Compiler for Machine Learning
deep-weather
Deep Learning for Post-Processing Ensemble Weather Forecasts
Shigangli's Repositories
Shigangli/Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Shigangli/Chimera
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
Shigangli/Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
Shigangli/eager-SGD
Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.
Shigangli/COMPI
Cache-oblivious MPI all-to-all communications based on Morton order
Shigangli/akg
AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.
Shigangli/bigbird
Transformers for Longer Sequences
Shigangli/bigbird-1
Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers
Shigangli/brian2
Brian is a free, open source simulator for spiking neural networks.
Shigangli/ColossalAI
Making large AI models cheaper, faster and more accessible
Shigangli/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Shigangli/dace
DaCe - Data Centric Parallel Programming
Shigangli/dlrm
An implementation of a deep learning recommendation model (DLRM)
Shigangli/dlrover
DLRover: An Automatic Distributed Deep Learning System
Shigangli/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Shigangli/flax
Flax is a neural network library for JAX that is designed for flexibility.
Shigangli/legion
The Legion Parallel Programming System
Shigangli/longformer
Longformer: The Long-Document Transformer
Shigangli/mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Shigangli/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Shigangli/Nystromformer
Shigangli/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Shigangli/p4app-switchML
Switch ML Application
Shigangli/ROCm
ROCm - Open Source Platform for HPC and Ultrascale GPU Computing
Shigangli/shigangli.github.io
Homepage of Shigang Li https://shigangli.github.io/
Shigangli/sparsegpt
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Shigangli/TensorRT_Tutorial
Shigangli/vision_transformer
Shigangli/XiangShan
Open-source high-performance RISC-V processor
Shigangli/yapf
A formatter for Python files