NOAA-GFDL/MDTF-diagnostics

Error running example diagnostic

Closed this issue · 12 comments

Hello, I am a graduate student at FSU working with my advisor Allison Wing to incorporate our TC POD into the framework. We are at the stage of just trying to run the example diagnostic on sample model code and are running into an issue - this probably reflects some error we have made but are a bit stuck on figuring it out. We downloaded the sample input_data and installed the MDTF code. Per instructions we edited the template src/default_tests.jsonc file to set the paths for MODEL_DATA_ROOT etc.. and saved it as config.jsonc in our MDTF-diagnostics directory. When we run it for the example POD for the NCAR-CAM5.timeslice casename (./mdtf -f config.jsonc) we get the following error: (see attached for whole log file).

16:10:00 ERROR: Caught exception processing <#2.example:tas (=tas) @ 1mo> with data_key=(3,): DataPreprocessError(Data preprocessing error for data in tas: Preprocessing on <#2.example:tas> failed at CropDateRangeFunction.)

We get this error each variable (and also when we try to run other existing PODs). This prevents the POD from running.

Can you provide any guidance on why this might be happening and how to fix it? We haven't changed anything anywhere except editing that overall framework jsonc file to set the paths and choose which case/POD we are running. We have read all the documentation and support and aren't sure what could be wrong. Sorry if this is a very naive question, but thanks for any help you can give!
mdtf.log

@jcstarr The timeslice experiment is still under development by our NCAR collaborators, and I don't know that it is expected to work with the current main branch (I'll check with them at this Wednesday's group meeting). If the QBOi.EXP1.AMIP.001 and GFDL.CM4.c96L32.am4g10r8 experiments run with the original sample PODs provided in the src/default_tests.jsonc file, you have installed the package correctly, and can proceed with your development. If you had, or are having issues running these test experiments, please provide the log, as well as the config file you are using so I can replicate the problem,

Thank you @wrongkindofdoctor for the update on the timeslice experiment. Unfortunately, the same issue is coming up for the other experiments. I have attached the config file that I have been using which has been pasted into word as the dropbox is not allowing me to input a .jsonc file. I also have the mdtf.log files that are output when trying to run the experiments using the QBOi.EXP1.AMIP.001 and GFDL.CM4.c96L32.am4g10r8 data. I attached copies of the logs with the experiment name in front with an underscore to distinguish the two logs. Thank you for the continued help.
GFDL_mdtf.log
QBOi_mdtf.log
jsonc_file_used.docx

@jcstarr The logs say that the framework can't find your config file, and the jsonc screenshot you sent shows that all experiments are commented out. The framework is trying to run everything based on default locations sites/defaults.jsonc, and failing. If you are not doing so, remove the double forward slashes // in front of code blocks containing the settings for the the desired experiment and POD(s) you wish to run. For example, to run a subset of the default PODs: Wheeler-Kiladis, EOF500 hpa, etc... in the QBOI experiment, the POD section of your config file would look something like this. Notice that there are commas separating each pod in the pod_list.

If you wish to run more pods for the experiment, make sure add commas after each pod in the list, except for the last one. Else, you'll get errors that the framework can't parse the jsonc. Also, note that the last couple of PODs are commented out, and will not run in this example:


  "case_list" : [
    // The cases below correspond to the different sample model data sets. Note
    // that the MDTF package does not currently support analyzing multiple
    // models in a single invocation. Comment out or delete the first entry and
    // uncomment the second to run NOAA-GFDL-AM4 only for the MJO_prop_amp POD,
    // and likewise for the SM_ET_coupling POD.
    {
      "CASENAME" : "QBOi.EXP1.AMIP.001",
      "model" : "CESM",
      "convention" : "CESM",
      "FIRSTYR" : 1977,
      "LASTYR" : 1981,
      "pod_list": [
          // Optional: PODs to run for this model only (defaults to all)
          "Wheeler_Kiladis",
          "EOF_500hPa",
          "MJO_suite",
         "MJO_teleconnection"
          //"convective_transition_diag"
          //"precip_diurnal_cycle"
      ]
    }

// {
    //   "CASENAME" : "GFDL.CM4.c96L32.am4g10r8",
    //   "model" : "AM4",
    //   "convention" : "GFDL",
    //   "FIRSTYR" : 1,
    //   "LASTYR" : 10,
    //   "pod_list" : ["MJO_prop_amp"]
    // }
  ],

The framework will only run one experiment at a time in its current state, so to run the GFDL.CM4 experiment in the next block, you'd need to comment out the QBOI section, and uncomment the GFDL CM4 section:

 "case_list" : [
    // The cases below correspond to the different sample model data sets. Note
    // that the MDTF package does not currently support analyzing multiple
    // models in a single invocation. Comment out or delete the first entry and
    // uncomment the second to run NOAA-GFDL-AM4 only for the MJO_prop_amp POD,
    // and likewise for the SM_ET_coupling POD.
   // {
     // "CASENAME" : "QBOi.EXP1.AMIP.001",
    //  "model" : "CESM",
     // "convention" : "CESM",
     // "FIRSTYR" : 1977,
     // "LASTYR" : 1981,
     // "pod_list": [
          // Optional: PODs to run for this model only (defaults to all)
        //  "Wheeler_Kiladis",
       //   "EOF_500hPa",
       //   "MJO_suite",
        // "MJO_teleconnection"
          //"convective_transition_diag"
          //"precip_diurnal_cycle"
  //    ]
 //   }
 {
       "CASENAME" : "GFDL.CM4.c96L32.am4g10r8",
       "model" : "AM4",
      "convention" : "GFDL",
       "FIRSTYR" : 1,
       "LASTYR" : 10,
       "pod_list" : ["MJO_prop_amp"]
     }
  ],

I'm not sure what command options you are using based on the logs, but please let me know exactly what you are running and from which directory if the following is not working:

To run the framework, cd to mdtf/MDTF-diagnostics directory on your machine. and run ./mdtf -f [path to your jsonc config file]/[config file name].jsonc -v

In the config.jsonc file that I am using (sent again here, for running the MJO_prop_amp diagnostic) I indeed commented out those cases that I do not want to run. I just attempted to run the MJO_prop_amp diagnostic again with the GFDL.CM4.c96L32.am4g10r8 case and did exactly as you have copied and pasted above. To run it, I cd'd to the mdtf/MDTF-diagnostics directory and then I run exactly as shown with the
./mdtf -f config.jsonc -v
as my edited config.jsonc file is in the MDTF-diagnostics directory already.

I agree with you that, from the logs, the framework is trying to run everything based on default locations sites/defaults.jsonc. But I thought that specifying the path to my jsonc config file in the call to ./mdtf would tell it where to run things from? As I said, I have not moved nor deleted any other files from the latest version of the framework that I downloaded from the GFDL github. It seems like there is no defaults.jsonc file in the sites sub-directory as viewed on github either (see screenshot below), so that would be why there isn't any in my version. Was I supposed to put my config file in that directory instead?

image

[mdtf (1).log](https://github.com/NOAA-GFDL/MDTF-diagnostics/files/9549720/mdtf.1.log) [GFDL_MJO_prop_amp_jsonc_copy.docx](https://github.com/NOAA-GFDL/MDTF-diagnostics/files/9549724/GFDL_MJO_prop_amp_jsonc_copy.docx)

@jcstarr Ah, I see. It looks like the pod settings in your config file are okay, and I'm rather stumped at the moment. You can try a couple of things:

-There may be some framework trouble handling the relative paths if the config file is not in the src directory. Try moving your config file to the src directory, and run ./mdtf -f src/config.jsonc -v
-Your machine may not have enough resources to run mjo_prop_amp with the test dataset (likely not an issue on a decent workstation, but my gov-issued laptop can't handle it so, it's possible). If so, just stick to running the QBOI tests. I'd try the EOF500_hpa POD in the QBOI experiment to start and go from there, since it is one of the simplest diagnostics aside from the example POD.

As an aside, I noticed that the paths to your conda installation and conda envs directories in the config file are probably not correct. They should look something like:

"conda_env_root": "[path to anaconda/miniconda3 directory]/miniconda3/envs",

"conda_root": "[path to anaconda/miniconda3 directory]/miniconda3"

I have now attempted all of the possible fixes above and I have unfortunately still come away with the same errors, even after changing the conda roots. I also tried to re-download all the necessary environments and it still unfortunately had issues with this CropDateRange function. I have attached my updated log for the EOF500_hpa POD and the corresponding config.jsonc file which I did move to the src directory. Thank you for the continued help on this.
EOF500_hpa_mdtf.log
~$nfig.jsonc_EOF500hpa.docx

Just chiming in to add that @jcstarr is running this on my 16-core 256 GB RAM server, so resource availability is not an issue. But I'm sort of glad that @wrongkindofdoctor is stumped as well, as I thought we were doing everything right! Do you think it is some sort of conda environment issue or something with the framework files? We can try a fresh install, perhaps.

@allison-wing @jcstarr Okay, thank you for verifying that you have sufficient resources. Unless there are possibly some permission issues or space constraints in @jcstarr's home directory that are preventing the framework from copying and/or writing temp files in the wkdir location, the only other apparent issue is the package version you are working with.

The commit hash in the log looks like it points to an older package version. I see that you have your own fork of the package, and assume you are working with that. Your main branch is behind the NOAA-GFDL main branch by several commits so try the following:

  • in the web interface, update your main branch by clicking the sync fork button
  • in your local fork, if you are not on the main branch, stash your modifications locally (git stash), or commit and push your current working branch to your remote repo
    -check out the main branch: git checkout main
  • run:
git fetch
git pull

to update your local main branch

  • Try running the package again if you were modding your main branch OR, if you were working on a different branch,:

  • check out the branch you were working on: git checkout [branch name]

  • unstash the changes with git unstash if you ran git stash before switching to main; if you committed and pushed, the - previous changes will still be there

  • merge the updates from the main branch into your local branch: git merge main

  • try running the package again

If this does not work, I can work 1-on-1 via video chat with @jcstarr to try and walk through the issue with some screen sharing if you are allowed to do so.

There shouldn't be any permission or space issues, unless it writes more than 600 GB. How frequently is the NOAA-GFDL main branch updated? I think Jarrett only forked his version a few weeks ago. Anyway, we will try updating things!

I'm also trying it on my desktop computer, in case there is something weird about the python install or filesystem on my server, but predictably I'm struggling with that too (just because I forget how I set up my git authentication, not because of an issue with the framework).

I will certainly try all of these updates above, if still no success then I will send over an email in trying to set up a video chat to go over things. Thanks for the continued help.

Great news, after updating the conda envs again I now got messages saying the PODs exited normally and looks like all the expected files were output. Thank you very much for meeting with me and solving this issue @wrongkindofdoctor !

@jcstarr That's great news! I'm glad that the environment updates fixed the issue, and hope the rest of your dev work goes smoothly.