The Question about reverse_layout_transform_kernel
Fragile-azalea opened this issue · 1 comments
Hi authors,
When I simulate a condition that num_local_gpus=2, num_nodes=2, samples=8, hidden=1 like the Figure 6(https://arxiv.org/pdf/2203.14685.pdf), I find that ha2a_reverse_layout_transform_kernel may not achieve the ideal result.
The output of ha2a_reverse_layout_transform_kernel maybe 00 01 20 21 10 11 30 31 instead of 00 10 20 30 01 11 21 31 in worker:0.
Is this line code should changed
Hetu/src/ops/H_A2A_LayoutTransform.cu
Line 44 in 15209eb
into
output_data[(target_gpu_id*data_size_per_gpu+target_node_id*data_size_per_gpu_per_node+gpu_id*data_size_per_gpu_per_gpu+offset) * (hidden) + j]=input_data[i * (hidden) + j];
?
Sorry for my poor English.
Hi, Fragile-azalea.
Thanks for your attention on our project and pointing out our problem!
I have verified this function again and then fixed this bug samed with your suggestion. Please see our commit #53.
Sorry for this mistake and thanks for your help.
Best Regards.