pipeline parallel fwd/bwd里面为什么没有调用optimizer.backward_epilogue()
Closed this issue · 4 comments
jingjie01ai commented
在forward_backward_no_pipelining中实调用了optimizer.backward_epilogue() 进行grad的累加拷贝和bucket重置,为什么在forward_backward_pipelining_without_interleaving和forward_backward_pipelining_with_interleaving里面没有调用?
li-yi-dong commented
实际是需要的,目前pipeline 还在适配。
jingjie01ai commented
li-yi-dong commented
设置CUDA_DEVICE_MAX_CONNECTIONS=1 会导致无法并行。
jingjie01ai commented
设置CUDA_DEVICE_MAX_CONNECTIONS=1 会导致无法并行。
那是不是开启overlappedDistOpt的话,就不能使用seq parallel了?
Using sequence parallelism requires setting the environment variable "
"CUDA_DEVICE_MAX_CONNECTIONS to 1