NOAA-EMC/WW3

Restart does not work for some cases using the unstructured scheme

Closed this issue · 9 comments

Describe the bug
We found that in the global unstructured 1deg model the restart is not working (ww3_ufs1.1). The bug results in the fact that not all domains, within the decomposition, read the restart file properly. This results in zero wave action within most of the global domain.

To Reproduce
run ww3_ufs1.1 using a restart generated in a separate run.

Expected behavior
restart function-ability should read the restart file correctly and distribute it among all tasks.

Screenshots
obvious when reproduced

Additional context
none

We found that in the global unstructured 1deg model the restart is not working (ww3_ufs1.1)

@aronroland this has always been my experience. Related to #1134?

Hi @MatthewMasarik-NOAA
This is not the same, but can be related. I know that some variables like ustar, ... is written in restart, causing non b4b results.
What we are experiencing in our work is the model reads a restart but fields are zero. This is the implicit case. B4B is not our concern.

@aronroland @aliabdolali would you mind providing more details about which ufs1.1 case you are running and having issues with as it seems this is solver dependent maybe by @aliabdolali's comments?

in ww3_ufs1.1/unstr, I put three cases (see info):

# grid_a: Domain Decomposition (PDLIB) and Explicit solver

# grid_b: Domain Decomposition (PDLIB) and Block Explicit solver

# grid_c: Domain Decomposition (PDLIB) and Implicit solver

We are testing grid_c which is implicit DD, you can test others as well.
We don't need B4B (grid_c), and it is different from Block_Explicit Case you are testing at your end (grid_b).

Hi @MatthewMasarik-NOAA This is not the same, but can be related. I know that some variables like ustar, ... is written in restart, causing non b4b results. What we are experiencing in our work is the model reads a restart but fields are zero. This is the implicit case. B4B is not our concern.

Thanks for that added context @aliabdolali.

Ok, so the solution was easy for this one. We had the following issue, in some setups the IOSTYP was set to 0 manually, which we actually never used. Now, looking more on the code I can see that basically IOSTYP = 0, was never developed for the restart feature within the domain decomposition approach and therefore it is not working using PDLIB. Anyway, we spend most of our efforts in IOSTYP=1, which is default in WW3. From my point of view we should bump out the executation of the code, when IOSTYP=0 is used in context with PDLIB.

@aronroland, thank you for posting your findings on this. At the EMC we have still been looking IOSTYP=0 in some experiments, so this is helpful information for us. We were curious if there are any more details, one question being if this is only related to the implicit solver? @JessicaMeixner-NOAA was also in particular interested if there are certain parts of w3iors that have been implemented for IOSTYP=0 with PDLIB, or does that whole routine need to be implemented for that case? Thank you.

@MatthewMasarik-NOAA the Problem is that i have never used restart functionality until we started the down scaling efforts. Here we improved the memory footprint and streamlined this part that it adopts the domain decomposition. This issue is only related to the domain decomposition (PDLIB) and is the same for any solver on unstructured grids. As I can see things IOSTYP = 0 only works for the mosaic parallelization scheme. Ideally we have only one output type, which works.

@MatthewMasarik-NOAA the Problem is that i have never used restart functionality until we started the down scaling efforts. Here we improved the memory footprint and streamlined this part that it adopts the domain decomposition. This issue is only related to the domain decomposition (PDLIB) and is the same for any solver on unstructured grids. As I can see things IOSTYP = 0 only works for the mosaic parallelization scheme. Ideally we have only one output type, which works.

Okay, understood. Thanks @aronroland. Fyi @JessicaMeixner-NOAA