[Feature Request]Add more comm op support on gpu_hlo_cost_analysis

Question

[Feature Request]Add more comm op support on gpu_hlo_cost_analysis

Opened this issue 5 months ago · 4 comments

zjjott commented 5 months ago

such as allgather/reducescatter/alltoall, which is commonly usely on fsdp/moe model
my suggestion:

decoupling output_bytes_accessed and bytes_accessed,which bytes_accessed shoud be operand size bytes. not put them together
add allgather/reducescatter/alltoall support
support upper op on gpu_collective_performance_model.cc;

we can make a table to show how many bytes will send inter nodes and inner nodes

I have prepared a PR for them, is this Feature Request needed?

additional question:
Can I get which gpu is inner node gpu? use nvml api?

Answer 1 · 2024-05-06T08:36:51.000Z

@Tixxx @olegshyshkov

Answer 2 · 2024-05-06T14:06:44.000Z

Could you please add more details why the current model doesn't work for comm ops?

decoupling output_bytes_accessed and bytes_accessed,which bytes_accessed shoud be operand size bytes. not put them together

There is also operand_bytes_accessed. And bytes_accessed is just sum(operand_bytes_accessed) + output_bytes_accessed.

We don't use bytes_accessed in gpu_performance_model.cc, but I see uses in other parts of the codebase and I can't predict all the implications if we change semantics of the field.

Answer 3 · 2024-05-07T02:24:45.000Z

@olegshyshkov OK,I understand~
so the second feature, do we need more op support like allgather/reducescatter/alltoall cost analysis? I have finish a PR to do this

Answer 4 · 2024-05-07T02:50:08.000Z

@olegshyshkov OK,I understand~ so the second feature, do we need more op support like allgather/reducescatter/alltoall cost analysis? I have finish a PR to do this

Yes, having more support for those ops will be good if you can add it.