NOAA-GFDL/icebergs

Out of bounds index in save_restart (fms_io)

adcroft opened this issue · 6 comments

It looks restarts do get written when in production mode but there is clearly a bug somewhere since we get an out-of-bounds with a dbug executable:

forrtl: severe (408): fort: (2): Subscript #1 of the array SBUF has value 1 which is greater than the upper bound of 0

Image PC Routine Line Source
MOM6 00000000055B6FAA Unknown Unknown Unknown
MOM6 00000000055B5B25 Unknown Unknown Unknown
MOM6 0000000005572786 Unknown Unknown Unknown
MOM6 000000000550EE95 Unknown Unknown Unknown
MOM6 000000000550F2E9 Unknown Unknown Unknown
MOM6 0000000004EDA0D1 mpp_mod_mp_mpp_ga 77 mpp_gather.h
MOM6 0000000004BDD30C mpp_io_mod_mp_mpp 30 mpp_write_unlimited_axis.h
MOM6 0000000003D210AA fms_io_mod_mp_sav 2516 fms_io.F90
MOM6 0000000003D0A533 fms_io_mod_mp_sav 2114 fms_io.F90
MOM6 0000000000D7E44A ice_bergs_io_mp_w 211 icebergs_io.F90
MOM6 0000000003517354 ice_bergs_mp_iceb 2152 icebergs.F90
MOM6 000000000338FC72 ice_type_mod_mp_i 967 ice_type.F90
MOM6 000000000168E1D0 ice_model_mod_mp_ 4071 ice_model.F90
MOM6 0000000001930DB7 coupler_main_IP_c 1662 coupler_main.F90
MOM6 000000000192A4A3 MAIN__ 887 coupler_main.F90
MOM6 00000000004006AC Unknown Unknown Unknown
MOM6 00000000055C58D4 Unknown Unknown Unknown
MOM6 000000000040057D Unknown Unknown Unknown
[NID 00252] 2015-07-16 13:59:37 Apid 105491249: initiated application termination

Allistair,
I'll take a look at this when I'm back next week. But we use
integer (aka Cray) pointers to "reshape" arrays to a different rank. It
may also be the case that there there are no elements to be transferred.
In this case, MPI does the right thing and doesn't send anything.

The net is that I don't think anything bad really happens. But I'll check

On 07/16/2015 02:05 PM, Alistair Adcroft wrote:

It looks restarts do get written when in production mode but there is
clearly a bug somewhere since we get an out-of-bounds with a dbug
executable:

forrtl: severe (408): fort: (2): Subscript #1
#1 of the array SBUF has
value 1 which is greater than the upper bound of 0

Image PC Routine Line Source

MOM6 00000000055B6FAA Unknown Unknown Unknown
MOM6 00000000055B5B25 Unknown Unknown Unknown
MOM6 0000000005572786 Unknown Unknown Unknown
MOM6 000000000550EE95 Unknown Unknown Unknown
MOM6 000000000550F2E9 Unknown Unknown Unknown
MOM6 0000000004EDA0D1 mpp_mod_mp_mpp_ga 77 mpp_gather.h
MOM6 0000000004BDD30C mpp_io_mod_mp_mpp 30 mpp_write_unlimited_axis.h
MOM6 0000000003D210AA fms_io_mod_mp_sav 2516 fms_io.F90
MOM6 0000000003D0A533 fms_io_mod_mp_sav 2114 fms_io.F90
MOM6 0000000000D7E44A ice_bergs_io_mp_w 211 icebergs_io.F90
MOM6 0000000003517354 ice_bergs_mp_iceb 2152 icebergs.F90
MOM6 000000000338FC72 ice_type_mod_mp_i 967 ice_type.F90
MOM6 000000000168E1D0 ice_model_mod_mp_ 4071 ice_model.F90
MOM6 0000000001930DB7 coupler_main_IP_c 1662 coupler_main.F90
MOM6 000000000192A4A3 MAIN__ 887 coupler_main.F90
MOM6 00000000004006AC Unknown Unknown Unknown
MOM6 00000000055C58D4 Unknown Unknown Unknown
MOM6 000000000040057D Unknown Unknown Unknown
[NID 00252] 2015-07-16 13:59:37 Apid 105491249: initiated application
termination


Reply to this email directly or view it on GitHub
#7.

OK, Thanks.

I came across it because we're trying to debug trajectory I/O and of course are hitting this problem instead.

I just realized this is the same problem as issue #4.

On 2015.07.29 Jeff Durachta wrote:
"Well there are a couple of things in the current implementation of mpp_gatherV that are definitely in a gray area (Fortran treatment of 0 size arrays). While the 0 size arrays are legal in F90 and are not actually "touched" in the sense of trying to put data into them, it's clear why the compiler would think something is going on.

I've made some changes that avoid executing any code when the array size is 0 (i.e. when there's no iceberg data) and these pass a run of the debug build.

Niki: See /lustre/f1/unswept/Jeffrey.Durachta/TMP_icebergs/MOM6-examples/src/FMS/mpp/include/mpp_gather.h

It should receive thorough testing with cases where there are icebergs, especially a layout where there's a mix of some and 0 icebergs within an io_peset.
"

I tested this update with Intel compiler and Alistair tested it with GNU, and it seems to have fixed the issue. I have tagged it with
user/nnz/fix_debug_mode_crash_in_new_icebergs_io_jwd in gitlab and am going to request @underwoo to move testing tag on it for extensive testing.

I just encountered this again after switching to the 4d restarts branch which was based on ulm. Will the FMS patch be in the unofficial ulm patch?

Yes, and the tag will be created once we work through a few more issues.