NVIDIA/Megatron-LM

[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ?

Closed this issue · 6 comments

Your question
Ask a clear and concise question about Megatron-LM.

image

You can't use interleaved schedule without pipeline parallel
image

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

@ethanhe42 same question.

I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?

Marking as stale. No activity in 60 days.

It is because tensor_send_next and tensor_send_prev here are indistinguishable with PP=2: https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/pipeline_parallel/p2p_communication.py#L586.

This is a non-issue with overlap_p2p_comm since we split forward and backward communication in steady state. We fixed this here: 152c562.

Going to mark this as closed, feel free to re-open if you have additional questions.