Hsword/Hetu

The Question about reverse_layout_transform_kernel

Fragile-azalea opened this issue · 1 comments

Hi authors,

When I simulate a condition that num_local_gpus=2, num_nodes=2, samples=8, hidden=1 like the Figure 6(https://arxiv.org/pdf/2203.14685.pdf), I find that ha2a_reverse_layout_transform_kernel may not achieve the ideal result.

The output of ha2a_reverse_layout_transform_kernel maybe 00 01 20 21 10 11 30 31 instead of 00 10 20 30 01 11 21 31 in worker:0.

Is this line code should changed

output_data[(gpu_id*data_size_per_gpu+target_node_id*data_size_per_gpu_per_node+target_gpu_id*data_size_per_gpu_per_gpu+offset) * (hidden) + j]=input_data[i * (hidden) + j];

into
output_data[(target_gpu_id*data_size_per_gpu+target_node_id*data_size_per_gpu_per_node+gpu_id*data_size_per_gpu_per_gpu+offset) * (hidden) + j]=input_data[i * (hidden) + j]; ?

Sorry for my poor English.

Hi, Fragile-azalea.

Thanks for your attention on our project and pointing out our problem!

I have verified this function again and then fixed this bug samed with your suggestion. Please see our commit #53.

Sorry for this mistake and thanks for your help.

Best Regards.