MPI fails when graph size larger than INT_MAX
JingchengYu94 opened this issue · 1 comments
JingchengYu94 commented
We are using DGL + ParMETIS to run some experiments on vary large datasets, with 400 million nodes and 3.7 billion edges. But MPI communication will fail on this line
The reason is global graph size is larger than INT_MAX, which makes rdispls contains elements that larger than INT_MAX. And there is a direct cast from 64bit idx_t type rdispls to 32bit int type lrdispls here, which makes lrdispls overflow.
Is there any easy solution to handle this?
karypis commented
This is an issue with MPI prior to 4.0. ParMetis has been updated to use MPI 4.0's APIs that resolve this problem.