Shigangli

Beijing University of Posts and TelecommunicationsBeijing, China

Pinned Repositories

Chimera
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
Language:Python60 2 38
FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.
Language:Cuda7 1 03
Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Language:C++86 4 217
Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
Language:Python25 2 48
COMPI
Cache-oblivious MPI all-to-all communications based on Morton order
Language:C3 2 01
eager-SGD
Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.
Language:Python8 2 00
SpMV-on-Many-Core
A cross-platform Sparse Matrix Vector Multiplication (SpMV) framework for many-core architectures (GPUs and Xeon Phi).
Language:C++9 2 03
WAGMA-SGD
WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.
Language:Python6 2 00
daceml
A Data-Centric Compiler for Machine Learning
Language:Python82 8 3314
deep-weather
Deep Learning for Post-Processing Ensemble Weather Forecasts
Language:Jupyter Notebook92 13 018

Shigangli's Repositories

Shigangli/Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Language:C++85 4 217
Shigangli/Chimera
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.
Language:Python52 2 37
Shigangli/Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
Language:Python23 2 48
Shigangli/eager-SGD
Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.
Language:Python8 2 00
Shigangli/COMPI
Cache-oblivious MPI all-to-all communications based on Morton order
Language:C3 2 01
Shigangli/akg
AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.
Language:Python1 0
Shigangli/bigbird
Transformers for Longer Sequences
Language:Python1 0
Shigangli/bigbird-1
Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers
Language:Jupyter Notebook0 0
Shigangli/brian2
Brian is a free, open source simulator for spiking neural networks.
Language:Python1 0
Shigangli/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python0 0
Shigangli/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
Language:Python0 0
Shigangli/dace
DaCe - Data Centric Parallel Programming
Language:Python1 0
Shigangli/dlrm
An implementation of a deep learning recommendation model (DLRM)
Language:Python1 0
Shigangli/dlrover
DLRover: An Automatic Distributed Deep Learning System
Language:Python0 0
Shigangli/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python1 0
Shigangli/flax
Flax is a neural network library for JAX that is designed for flexibility.
Language:Python1 0
Shigangli/legion
The Legion Parallel Programming System
Language:C++0 0
Shigangli/longformer
Longformer: The Long-Document Transformer
Language:Python1 0
Shigangli/mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Language:C++1 0
Shigangli/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Language:Jupyter Notebook1 0
Shigangli/Nystromformer
Language:Python1 0
Shigangli/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++1 0
Shigangli/p4app-switchML
Switch ML Application
Language:C++0 0
Shigangli/ROCm
ROCm - Open Source Platform for HPC and Ultrascale GPU Computing
1 0
Shigangli/shigangli.github.io
Homepage of Shigang Li https://shigangli.github.io/
Language:HTML2 0
Shigangli/sparsegpt
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python0 0
Shigangli/TensorRT_Tutorial
Language:C++1 0
Shigangli/vision_transformer
Language:Jupyter Notebook1 0
Shigangli/XiangShan
Open-source high-performance RISC-V processor
Language:Scala0 0
Shigangli/yapf
A formatter for Python files
Language:Python1 0