ParCIS Lab, BUPT
Parallel Computing and Intelligent Systems Laboratory (ParCIS Lab), Beijing University of Posts and Telecommunications
China
Pinned Repositories
Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
DNN-cpp-proxies
C++/MPI proxies for distributed training of deep neural networks.
FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.
Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
ParCIS Lab, BUPT's Repositories
ParCIS/Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
ParCIS/Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
ParCIS/Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
ParCIS/FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.
ParCIS/DNN-cpp-proxies
C++/MPI proxies for distributed training of deep neural networks.