[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ？

Question

Closed this issue 2 days ago · 6 comments

Your question
Ask a clear and concise question about Megatron-LM.

Answer 1 · 2024-03-28T17:43:07.000Z

You can't use interleaved schedule without pipeline parallel

Answer 2 · 2024-03-31T13:36:12.000Z

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

Answer 3 · 2024-04-02T05:17:10.000Z

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

@ethanhe42 same question.

Answer 4 · 2024-04-07T05:26:57.000Z

I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?

Answer 5 · 2024-06-06T18:20:44.000Z

Marking as stale. No activity in 60 days.

Answer 6 · 2024-06-06T19:27:50.000Z

This is a non-issue with overlap_p2p_comm since we split forward and backward communication in steady state. We fixed this here: 152c562.

Going to mark this as closed, feel free to re-open if you have additional questions.