JRA55 tx1 forcing is unphysical (JRA55do is OK)
phil-blain opened this issue · 17 comments
In the process of testing the C grid code with the new areafact computation in a long tx1
simulation, I found out that the current tx1 JRA55 forcing is unphysical, likely coming from the interpolation procedure used to create it.
For example, here are the x-ward winds at the first time slice of year 2005:
Similar patterns can be found for the other years and the rest of the forcing fields. The longitude field in the forcing files is also wrong:
compared to the correct tx1 longitude field (from the tx1 bathymetry file):
Fortunately it seems the newer "JRA55do" data is correctly interpolated.
So this raises a few points:
- What script was used to create the JRA55 data ? to create the JRA5do data ? I found https://github.com/CICE-Consortium/CICE/blob/main/configuration/tools/jra55_datasets/interp_jra55_ncdf_bilinear.py, which dates back from the original addition of JRA55 in e204fb8. But it's unclear if this same script was used to create the JRA55do dataset.
- What should we do with the current JRA55 dataset, which is on Zenodo ? should we create a new version, correctly interpolated, or just deprecate it and use JRA55do for the tx1 grid ?
/cc list based on #435: @daveh150 @rallard77 @dabail10 @apcraig
Wow. Good catch. I would vote for switching to the JRA55do myself.
I thought that we (that would be the "royal we") had done simulations using JRA55 before releasing the code and posting the data. Why didn't they show problems? Has the data been corrupted somehow?
I think we didn't check the tx1 grid very carefully.
I did re-download from Zenodo yesterday and got the same bad fields, and I checked the md5 checksum was the same as that on Zenodo.
This is definitely weird. Not sure where in the process this was introduced.
Thanks Philippe. Yes, strange indeed! Do you have the other years? Is this only int the 2005 file?
It's for all the years, all the fields.
Thank you Phillipe!
Not sure what is happening. A while ago I started running out of local storage space so I removed all the JRA55 files. I can't point for sure if this is somehow in the way I generated the yearly files or something else. Anyway, I will redownload the JRA55 data and regenerate the tx1 forcing for 2005-2009 so we do not have a corrupt set on zenodo.
Sorry for whatever is causing this.
Best I can tell, when I started to make the tx1 files I moved to another working directory, and I must have accidentally still used the gx1 regridding weight file. (either forgot to change name or type-o).
I have fixed this and reprocessed the data. New dataset is uploading now. It might be done by tomorrow, I'll send the new zenodo link for you to verify.
The new files should have new names and we need to modify CICE so it picks up the new filenames. Otherwise, there is no way to trap use of the old files and no obvious way to know whether the old or new files are on any given filesystem. This needs to be standard practice when we update a set of files.
In general, I sort of wish each forcing filename had a date string in it to indicate when it was created. That also provides easy versioning. But short of that, I don't generally love "fix", my preference would be "v2". But open to hearing other options. We'll need a PR to CICE too.
I like adding a date to the dataset as well. I don't think we should change the atm_data_type. This might also argue for softlinks again. Where the name JRA55_tx1_03hr_forcing_2005.nc would point to a file like JRA55_tx1_03hr_forcing_2005_20230927.nc. I can update the dataset on Zenodo when we are ready.
@apcraig @dabail10 Having the date in the filename is certainly a good idea. It would have to be before the '2005' though to not mess up the 'file_year' function. Maybe
JRA55_tx1_20230927_03hr_forcing_2005.nc
Also we might be able to avoid symlinks if we specify atm_data_dir to have the date like above, such as atm_data_dir = 'tx1_20230927'. We'd still need to add some logic to check for the date in JRA55_files subroutine, but it might not be too bad and clearer in the long run. What do y'all think about this?
I like this idea, but maybe still have a symbolic link tx1 -> tx1_20230927. Then we just update the symbolic need if needed.
softlinks don't really solve the problem of not using older data. With softlinks, you use the same filename all the time in the model and then just point to different datasets as they evolve in time. Is that the idea? You can't trap the use of an older file with that approach.
I think we need something like what @daveh150 proposes. The implementation has to support a "unique string" (i.e. a date) in the filename via the namelist setting explicitly. That won't prevent folks from setting the namelist to something that reads old data, but if the default in the code/scripts is to have it set to the latest data, the code will abort if that data is not there. That's what we want to happen. I don't care (too much) how the filenames or the namelist are setup. If the date string has to be in the middle, OK. What we want is an implementation that allows the forcing files to evolve without having to update the code all the time. We need an extensible implementation and we should document that in the user guide. We want to be able to add new grids, new years of data, and new versions of datasets without having to modify the code, for example. If we're not there yet with the implementation, we should fix it when we update the dataset.