NOAA-EMC/WW3

grid_imask created with SCRIPNC contains garbage values

Opened this issue · 29 comments

Describe the bug
When generating the SCRIPNC file in w3gridmd, the grid_imask variable is defined with dimension size GRID1_SIZE. When allocated and filled, it is only valid for GRID1_DIMS(1). It should be allocated and filled to size GRID1_SIZE. This bug results in a grid_imask field which is valid only on the first row of the domain.

For the mesh cap, having non-valid values in the grid_imask propagates to the ESMF Unstructured mesh. For structured grids, the mesh cap will replace the mask obtained from the unstructured ESMF mesh if it does not currently contain a mask (ie, if the current values are ==1 everywhere). In this case however, the meshmin is neither 0 or 1 (it is in fact -2147483647 for the glo_025.SCRIP.nc). This means that destination and source masking in CMEPS is not working correctly since it relies on a mask value of either 0 or 1 and mask in the mesh sent to CMEPS has a single row (j=1) which can be used as a mask.

Note: The code segment below needs to be read knowing that for structured meshes, the mesh that WW3 sends to the mediator (ie, the mesh that the export fields are built on) has the land points at the end of the distribution. So, the entire mask is set 0, and then the points which are WW3 "sea" points (ie, the values 1:nseal_cpl) are set to 1.

if (.not. unstr_mesh) then
! obtain the mesh mask and find the minimum value across all PEs
call ESMF_MeshGet(EMesh, elementDistgrid=Distgrid, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
call ESMF_DistGridGet(Distgrid, localDe=0, elementCount=ncnt, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
allocate(meshmask(ncnt))
elemMaskArray = ESMF_ArrayCreate(Distgrid, farrayPtr=meshmask, rc=rc)
if (chkerr(rc,__LINE__,u_FILE_u)) return
call ESMF_MeshGet(Emesh, elemMaskArray=elemMaskArray, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
call ESMF_VMAllFullReduce(vm, sendData=meshmask, recvData=maskmin, count=ncnt, &
reduceflag=ESMF_REDUCE_MIN, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
if (maskmin == 1) then
! replace mesh mask with internal mask
meshmask(:) = 0
meshmask(1:nseal_cpl) = 1
call ESMF_MeshSet(mesh=EMesh, elementMask=meshmask, rc=rc)
if (chkerr(rc,__LINE__,u_FILE_u)) return
end if
)
This is not an issue with rectilinear SCRIP files created from NCO (https://ufs-weather-model.readthedocs.io/en/develop/InputsOutputs.html#mesh-generation).

To Reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

To clarify - as long as we've created the structured meshes used with NUOPC to be via NCO as described here: https://ufs-weather-model.readthedocs.io/en/develop/InputsOutputs.html#mesh-generation we would not run into this issue?

Yes, that should work. The SCRIP file can be generated like so

 ncremap -g test.SCRIP.nc -G latlon=721,1440#snwe=-90.0,-90.0,-0.125,359.875#lat_typ=uni#lat_drc=s2n

EDIT: typo in above...should be -90:90, not -90:-90

Okay I believe that's how we have been generating things before so hopefully most things are okay as of now - but should definitely get this fixed!

In UFS, the meshes for HAFS were generated in this way, the only other rectilinear structured mesh is used in the atmwav_control_noaero_p8 test. I'm pretty sure I added that myself, and I would have generated the SCRIP/mesh using NCO.

Note also that generating via NCO will put the grid centers at the min/max values of +/- 89.875 and the corners will now be at +/- 90.

With the current mesh.glo_025.nc, the domain has (incorrectly, I believe) the centers at +/- 90.0 and the corners at +/- 90.125.

It turns out you cannot create an evenly divided rectilinear 0.25 resolution w/ 721 latitudes. If you try, the bottom left grid box corners are at -90.00000, -90.00000,-89.75035,-89.75035 and the center is at -89.87517.

Only by using 720 latitudes can you generate an exact 0.25 resolution, giving for the bottom left grid box corners -90.00000, -90.00000, -89.75000, -89.75000 and a center at -89.875.

The thing to remember when generating w/ NCO is that the snwe values are the values of the domain's bounding box---so the outer-most corners of the entire domain.

When specifying rectilinear in the inp file, the values appear to be the center lat and lon of the first (lower left) grid box.

@JessicaMeixner-NOAA Maybe GEFS (EP6) needs fix for glo_025 mesh? Running GEFS RT with:

diff --git a/model/src/wav_comp_nuopc.F90 b/model/src/wav_comp_nuopc.F90
index 4280b3b1..9382e54a 100644
--- a/model/src/wav_comp_nuopc.F90
+++ b/model/src/wav_comp_nuopc.F90
@@ -925,6 +925,9 @@ contains
       call ESMF_VMAllFullReduce(vm, sendData=meshmask, recvData=maskmin, count=ncnt, &
            reduceflag=ESMF_REDUCE_MIN, rc=rc)
       if (ChkErr(rc,__LINE__,u_FILE_u)) return
+      ! debug
+      maskmin = 1

changes baselines.

Tagging @sbanihash as she is the the waves GEFS POC

I'm not sure who (or how) the mesh.glo_025 mesh was created. I'm fairly sure I created the HAFS ones, but I don't remember for the GEFS. But if we need an update, we should do that!

Also @NickSzapiro-NOAA - to get the new mesh, I"m assuming this bug needs fixed asap?

Denise knows better but I think we need to replace glo_025 with a new one with different dimensions (and SCRIP made via nco commands)

Do you want @sbanihash and I to recreate a mesh file and test?

I think so. Right @DeniseWorthen?

Job script: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/makemeshww3_rocky8_025.job

#!/bin/sh --login

#SBATCH --nodes=1
#SBATCH -q batch
#SBATCH -t 08:00:00
#SBATCH -A marine-cpu
#SBATCH -J ww3hafsmesh
#SBATCH -o makemeshrocky8.out

  module purge
  module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core
  module load stack-intel/2021.5.0
  module load stack-intel-oneapi-mpi/2021.5.1
  module load cmake/3.23.1
  module load libpng/1.6.37
  module load zlib/1.2.13
  module load jasper/2.0.32
  module load hdf5/1.14.0
  module load netcdf-c/4.9.2
  module load netcdf-fortran/4.6.1
  module load bacio/2.4.1
  module load g2/3.4.5
  module load w3emc/2.10.0
  module load esmf/8.5.0

  module load nco
 

echo "ncremap:"
ncremap -g glo_025.SCRIP.nc -G latlon=721,1440#snwe=-90.0,90.0,-0.125,359.875#lat_typ=uni#lat_drc=s2n


echo "esmf script:" 
srun -n 1 ESMF_Scrip2Unstruct glo_025.SCRIP.nc mesh.glo_025.nc 0

scrip inbetween file: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/glo_025.SCRIP.nc

mesh file: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/mesh.glo_025.nc

Haven't tested yet... but sharing now all the same

@DeniseWorthen - Okay i think i need to recreate the glo_900 grid that I have made recently... I don't have the scripts to check the HAFS, but why did i think you had to do latitudes: (first point)-dy/2:(last point)-dy/2? similar to how the longitudes are done?

The correct, 025 degree rectilinear grid should be created with only 720 latitudes via the NCO command.

It turns out you cannot create an evenly divided rectilinear 0.25 resolution w/ 721 latitudes. If you try, the bottom left grid box corners are at -90.00000, -90.00000,-89.75035,-89.75035 and the center is at -89.87517.

Only by using 720 latitudes can you generate an exact 0.25 resolution, giving for the bottom left grid box corners -90.00000, -90.00000, -89.75000, -89.75000 and a center at -89.875.

The thing to remember when generating w/ NCO is that the snwe values are the values of the domain's bounding box---so the outer-most corners of the entire domain.

When specifying rectilinear in the inp file, the values appear to be the center lat and lon of the first (lower left) grid box.

However, using 720 will change the dimensions of the mesh and then the question is how much disruption that causes for the case Nick is trying to replicate (ie, the CMEPS restarts will not work, since now the WW3 domain will be different.)

@JessicaMeixner-NOAA The HAFS meshes I believe were all created by me, since they are rectlinear. Let me try to dig up the SCRIP file (amazed if it still exists....)

@DeniseWorthen that makes sense why I can't find the script to create them.

One option for "fixing" the mesh but maintaining (I think) the current ICs/warmstarts etc is to use NCO on the existing mesh and simply replace all values of mask with 1. That doesn't require any code changes, but would change all answers (I think).

See /scratch1/NCEPDEV/nems/Denise.Worthen/WORK/meshes

Thanks @DeniseWorthen - I plan to look at this tomorrow morning with fresh eyes.

Okay - looking through things again today and one thing I want to do is make sure all of the in-use mesh files are okay, so I've created a list of all known mesh files And now attempting to look into if these are consistently made, which ones have issues, etc.

I've looked quite a bit at the other meshes - and I think I can confirm that the only mesh that was created and has an issue is the glo_025 mesh. @DeniseWorthen your comment of:

One option for "fixing" the mesh but maintaining (I think) the current ICs/warmstarts etc is to use NCO on the existing mesh and simply replace all values of mask with 1. That doesn't require any code changes, but would change all answers (I think).

Based on looking at the other grids, it looks like this would be the output of code changes? And so might be worth pursuing anyways? Especially since these answer changes would be the same either way? Or - are we saying that this is a poorly defined grid and we should perhaps go back to step 1 with this? (Which would have a list of challenges as well). I've looked at what needs to be done code change wise and can try to implement next week?

I've also looked at a few other the others in the spreadsheet and only the glo_025 is bad.

However, if I use an identical ww3_grid.inp.glo_100 and the "create moddefs" script, I produce a SCRIP and mesh file which a) has bad grid_imask value in the SCRIP and b) does not contain an elementArea in the mesh. So something is different about how these files are getting created but w/o the actual SCRIP file that was used for the one in the "fix" directory, I don't know what it is.

Three things:

  1. I think that just printing out the sizes in w3gridmd indicates that the grid1_imask isn't dimensioned correctly so I think a code fix needs to be made in any case. It would be good to get that confirmed.

  2. My suggestion about using NCO to "fix" the grid_imask in the current grid would resolve Nick's reproducibility issue even w/o 1). I don't know whether it matters to gefs that the grid would change and they might see answer changes from that. I'm not sure they would though, because the reproducibility problem Nick was seeing was arising over what should have been land on the wave grid...

  3. I personally would not have designed the grid this way, but it apparently has run w/o issue (?). The disruption of making a new one might just be too much. That's a call I can't make.

An elementMask=1 version of the mesh.glo_025.nc file is at /work2/noaa/stmp/dworthen/mesh.glo_025.nc. I generated it using

ncap2 -s 'tmp=0*elementMask+1;elementMask=tmp' mesh.glo_025.nc
ncks -O -x -v tmp mesh.glo_025.nc mesh.glo_025.nc

@NickSzapiro-NOAA Please try this one in your RT test.

With the updated wave element mask, cpld_nothr_gefs test passes on hercules (as well as control, restart, dcp as before).

Answers change across components vs. the current mesh file. I will look at maps of diffs but runs are available at:
/work/noaa/nems/nszapiro/tasks/GEFS_RT/elementMask/ufs-weather-model/tests where
run_dir/cpld_{control,dcp,nothr,restart}_gefs_intel use the updated mask
run_dir/cpld_wavmask_ep5_gefs_intel is like control but with the original mesh file as in g-w

Some quick ncviews of ncdiffs of new-old elementMask tests seem to suggest that this is worth fixing (with caveat that this is only one case). Please let me know if proper quality figures, more depth,... would be helpful.

Differences in ww3 mapsta are from land and North of 82N. Differences in FV3 fields after its first timestep are evident along these domain boundaries (for surface T,u,v,...). These differences grow over the forecast, including a coherent impact over the Black Sea.

ww3 mapsta:
dwavemask_mapsta_ww3 hi 2021-03-25-32400

FV3 near-surface T,u after first timestep:
dwavemask_tmpsfc_sfcf000

dwavemask_u10_sfcf000

MOM6 SST after 3 hours:
dwavemask_sst_ocn_2021_03_25_09

FV3 Tsfc after 12 hours:
dwavemask_tmpsfc_sfcf012

We see differences in surface roughness which is likely directly impacted by the mask issue and causing all the other changes:

Screenshot from 2024-12-19 13-24-42

I agree this looks like this needs to be fixed.

It's a mess---I think it is a combination of the WAV mask being bad + we're using a srcMask=1 on the ATM side.

Debugging the grid imprint issue with bilinear mapping, tells me we'll need to switch the scrMask=undefined for ATM (in other words, the ATM fields are valid everywhere when used as a source). See the figure in NOAA-EMC/CMEPS#129