Issues
- 3
Automatic module mapping using torch.fx
#40 opened by xrsrke - 0
Multimodal MoE
#61 opened by xrsrke - 0
DiLoCo replication (DiLoCo: Distributed Low-Communication Training of Language Models)
#59 opened by xrsrke - 0
Implement new pipeline parallelism technique
#7 opened by xrsrke - 1
End-to-end FP8 training
#45 opened by xrsrke - 6
Port CUDA Kernels
#8 opened by xrsrke - 0
Save and load checkpoints
#29 opened by xrsrke - 0
Distributed CLIP
#60 opened by xrsrke - 0
- 0
Callbacks for Distributed Optimizer
#21 opened by xrsrke - 1
Gradient Checkpointing
#4 opened by xrsrke - 0
Mixture of Experts
#19 opened by xrsrke - 3
Kernel Fusion using torch.jit
#10 opened by xrsrke - 4
Deparallelize tensor parallelism
#11 opened by xrsrke - 0
Deparallelize pipeline parallelism
#34 opened by xrsrke - 1
Distributed Logger
#33 opened by xrsrke - 0
Tensor Parallelism
#37 opened by 3outeille - 1
Lazy initialization of massive models
#25 opened by xrsrke - 0
Reproducible in 3D Parallelism
#15 opened by xrsrke - 0
Mixed precision training in FP16
#14 opened by xrsrke - 4
- 8
Fused Optimizer
#13 opened by xrsrke - 1
Model partitioning for pipeline parallelism
#6 opened by xrsrke - 0
Dataloader and Sampler for 3D Parallelism
#9 opened by xrsrke - 0
- 0
- 0
Setup documentation
#16 opened by xrsrke - 1
Sequence Parallelism
#22 opened by xrsrke - 4
Checkpointing
#24 opened by xrsrke - 0
Support TPU
#26 opened by xrsrke - 0
- 0
Implement new tensor parallelism technique
#17 opened by xrsrke