Can we remove `nprocs` from `ice_in` ?
anton-seaice opened this issue · 6 comments
In this line, all that happens is we check nprocs
from the namelist matches the results from get_num_procs()
:
Can we just assign nprocs = get_num_procs()
?
We would have to move it a little bit earlier in the code, before:
This would make model configuration a little easier (i.e. one less configuration to have to match up in multiple places).
I've considered this a number of times, but have put off a change. I guess there are a few issues. First, is there ever a case where nprocs (MPI_COMM_SIZE) is not the number of process you want to run on? Could the CICE model be part of a multi-model system where somehow the MPI communicator doesn't match the number of processors the ice model should run on? Second, this feature provides a check. Even in a standalone mode, the size of the communicator is going to depend on how the model was launched (via mpiexec or similar). Is it helpful to have an internal check that the number of processors specified by the user in namelist actually matches the number of processors in the communicator, especially since block size and many other things will depend on the number of processors in play? Does the fact that MPI and OpenMP are supported together affect the usefulness of this feature?
Having said all that, I'm open to changing this. In that case, I assume you're setting the block size and the max blocks will accommodate a range of processor counts. And if the max blocks is too big, you just live with it (or maybe max blocks is set internally?).
Maybe I'll suggest adding support for setting nprocs=-1, which would mean ignore it and use get_num_procs instead. I think that provides backwards compatibility and support for cases where it may be useful, but allows users to rely on get_num_procs if they want.
There's many possibilities - but the code aborts if nprocs != get_num_procs, so any possibility that uses nprocs from the namelist as different to the system number of processors would need code modifications anyway.
We think we can improve the automated max_blocks estimate too, but it requires calling proc_decomposition
twice, which is a bit messy.
p.s. Your suggestion makes sense - it's hard to know, we are guessing about what some possible use cases are.
p.p.s We discovered setting max_blocks too big gives memory problems, and the automated setting of max_blocks wasn't working because nprocs
hadn't be set yet at that point in the code. So as a minimum I'd like to move this (CESMCOUPLED line) before the max blocks estimate:
The max_blocks setting is also problematic. I've looked at that too. The current estimate when max_blocks=-1 is not correct for some decompositions. It would be nice for the code to set it automatically for the grid, pe count, and decomposition properly. My assessment is the same, it would require some serious code changes to compute the max_blocks correctly. But it's possible. Maybe I'll look at that again. In my experience, setting max_blocks too large doesn't affect performance, but, of coarse, it does use memory.
I will implement the "nprocs=-1" option. Part of that will be to move the "nprocs=get_num_procs()" to before nprocs is used, will make sure it'll work for CESMCOUPLED too. I'll look at the max_block calc too again.
I've done some work on it already, I can make a PR and you can review / edit ?
Sure, would love to see what you've got. I've got a sandbox going to. Trying to see if I can get the max_blocks computed on the fly such that each task has the appropriate value. I think I'm close with a reasonable set of modifications. Feel free to create a PR or point me to your branch. Thanks!