idefix-code/idefix

BUG: No warning/check when resolution is not a power of 2 in 1D with auto-decomposition

Opened this issue · 5 comments

nscepi commented

Describe the issue:

There is no warning/check when the resolution is not a power of 2 in 1D problem and when using auto-decomposition, see here https://github.com/idefix-code/idefix/blob/d0e82202ca90d0b664c71582962c2ae255bc6528/src/grid.cpp#L166C7-L167C32. Apparently, there used to be a warning/check but it was changed in the commit dbe3405. @neutrinoceros do you remember why?

Error message:

No response

runtime information:

I am using the version d0e8220 of Idefix.

If I recall correctly, there is no ambiguity because resources do not need to be spread across multiple dimensions, so automatic decomposition is perfectly achievable in 1D regarless of box size, so I made sure it worked (or so I hope) and removed the warning. Is it causing you any trouble in production ?

(since I don't think there's a bug for now, I'm going to re-triage this as a question, but feel free to overwrite my judgment if there's an actual problem here)

nscepi commented

I think this is a bug! If you take, for example, the sod HD test problem and run it with a grid of 500 you get exactly the same result whether or not you use MPI, as you would expect. If you now take a grid of 501 you get an error of 2.5e-4 on the density (measured using the L2 norm as in idfx_test.py) between the serial and MPI run... When plotting the two solutions I see that the mismatch between serial and mpi is coming from the last cell of the (total) domain. The issue appears with other test problems as well.

I haven't got into the details of the MPI implementation so I don't know if IDEFIX is supposed to be able to handle subdomains of different sizes or not. If it's not, there should be a flag in 1D as well. If it is, there is something wrong.

Can you provide more detail on how to reproduce ? If I understand correctly, you are implying that you are using automatic domain decomposition in all your tests, but I'm not sure that's what you mean. Did you try the same test with manual domain decomposition too ?

I think the issue is that Idefix is unable to generate correct unbalanced domain decomposition, even in 1d. It not a problem of knowing how to balance the number of processes in each direction, it's how to decompose a grid that can't be divided by the number of processes.

It doesn't mean that your commit @neutrinoceros need to be reverted, it's just that we need to implement the possibility to have unbalanced domain decomposition (this takes place in the datablock constructor)

After talking with @neutrinoceros I will clarify my previous posts.

There are two separate but related issues:

  1. Idefix cannot handle for now unbalanced domain decomposition as @glesur pointed out.
  2. As a result, there should be an error popping up when the resolution of the grid is not a multiple of the number of mpi processes. This condition is checked for manual decomposition but not for automatic decompostion. I suggest to do it for both. I will make a pull request with a potential fix.

Note that 2) will have no reason to be once 1) is fixed.