alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
PythonApache-2.0
Issues
- 1
- 4
2台服务器分布式跑example中的resnet_split.py遇到无限等待的情况
#28 opened by alphabewitch - 1
epl单机单卡和单机多卡训练step如何理解
#30 opened by SueeH - 1
2机2卡实验NCCL报错
#29 opened by wind818 - 3
- 1
- 2
DingTalk QR code is outdated
#25 opened by co63oc - 1
DistributedDense只支持按照列切分吗?
#22 opened by kuangdao - 1
- 1
训练时,除chief worker外,其余worker在每次save checkpoint 后 step归0,且在第二次save checkpoint 后 整个进程卡死
#17 opened by walkingwindy