model-parallelism

There are 37 repositories under model-parallelism topic.

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python41.2k 385 1.7k4.5k
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python40.6k 349 3.2k4.6k
kakaobrain/torchgpipe
A GPipe implementation in PyTorch
Language:Python857 32 3498
PaddlePaddle/PaddleFleetX
飞桨大模型开发套件，提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Language:Python474 22 115166
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Language:Python407 40 7957
kaiyuyue/torchshard
Slicing a PyTorch Tensor Into Parallel Shards
Language:Python300 10 815
alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Language:Python270 12 1049
Shenggan/awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
249 10 029
xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Language:Python87 3 3219
hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
Language:Python86 2 237
tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Language:Python68 4 17
NERSC/sc23-dl-tutorial
SC23 Deep Learning at Scale Tutorial Material
Language:Python48 21 010
vdutts7/dnn-distributed
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
Language:C++43 2 012
NERSC/dl-at-scale-training
Deep Learning at Scale Training Event at NERSC
Language:Python20 16 014
NERSC/dl4sci25-dl-at-scale
Deep learning for science school material 2025
Language:Python174
ryantd/veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Language:Python17 5 00
AlibabaPAI/FlashModels
Fast and easy distributed model training examples.
Language:Python12 11 04
atakehiro/3D-U-Net-pytorch-model-parallel
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Language:Python9 1 00
ShashankSubramanian/transformer-perf-estimates
Performance Estimates for Transformer AI Models in Science
Language:Jupyter Notebook9 6 01
Shenggan/atp
Adaptive Tensor Parallelism for Foundation Models
Language:Python9 1 00
fanpu/DynPartition
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Language:Python7 1 00
dlzou/computron
Serving distributed deep learning models with model parallel swapping.
Language:Jupyter Notebook5 1 01
garg-aayush/model-parallelism
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
Language:Python5 1 00
dscpesu/NetTorrent
A decentralized and distributed framework for training DNNs
Language:Python4 3 00
explcre/pipeDejavu
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism
Language:Jupyter Notebook3 1 00
LER0ever/HPGO
Development of Project HPGO | Hybrid Parallelism Global Orchestration
3 1 00
AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
Language:Jupyter Notebook2 1 01
EunjuYang/distributed-tf
distributed tensorflow (model parallelism) example repository
Language:Python2 1 00
ngrabaskas/Torch-Automatic-Distributed-Neural-Network
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
Language:Lua2 2 02
sjlee25/legion-readme
Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)
2 1 00
d4l3k/axe
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
Language:Go1 2 0
zhuangsc/altsplit
An MPI-based distributed model parallelism technique for MLP
Language:C1 1 02
ankahira/chainermnx
Extended ChainerMN
Language:Python0 2 00
joelrorseth/HyperTune
A fully distributed hyperparameter optimization tool for PyTorch DNNs
Language:Python0 1 00
olk/mnist-performance
performance test of MNIST hand writings usign MXNet + TF
Language:Python1 0
sparklerz/petals-llama2-70b
Decentralised inference + Prompt tuning of LLaMA-2-70B with Petals (swarm model-parallelism). Meta-repo with summary and links to the two write-ups.

model-parallelism

hpcaitech/ColossalAI

deepspeedai/DeepSpeed

kakaobrain/torchgpipe

PaddlePaddle/PaddleFleetX

Oneflow-Inc/libai

kaiyuyue/torchshard

alibaba/EasyParallelLibrary

Shenggan/awesome-distributed-ml

xrsrke/pipegoose

hkproj/pytorch-transformer-distributed

tanyuqian/redco

NERSC/sc23-dl-tutorial

vdutts7/dnn-distributed

NERSC/dl-at-scale-training

NERSC/dl4sci25-dl-at-scale

ryantd/veloce

AlibabaPAI/FlashModels

atakehiro/3D-U-Net-pytorch-model-parallel

ShashankSubramanian/transformer-perf-estimates

Shenggan/atp

fanpu/DynPartition

dlzou/computron

garg-aayush/model-parallelism

dscpesu/NetTorrent

explcre/pipeDejavu

LER0ever/HPGO

AnveshaM/Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform

EunjuYang/distributed-tf

ngrabaskas/Torch-Automatic-Distributed-Neural-Network

sjlee25/legion-readme

d4l3k/axe

zhuangsc/altsplit

ankahira/chainermnx

joelrorseth/HyperTune

olk/mnist-performance

sparklerz/petals-llama2-70b