NOAA-GFDL/MDTF-diagnostics

Add support for "development mode" to skip preprocessing and copying files

Closed this issue · 2 comments

Developers are requesting a "bare-bones" mode for framework that runs raw data files without copying. This differs slightly from the --disable-preprocessor option, which stills copies files to WK_DIR before launching the PODs. Thus, the user needs to ensure that the file paths and names adhere to the Local_file specifications, and the framework just has to check that the required files are present in MODEL_DATA_ROOT and/or OBS_DATA_ROOT, and symlink the directories in WK_DIR.

Just to clarify what I think would be the quick and (hopefully) easy fix: have the preprocessor run but give it an option to check for a file's existence. If a flag is set to not overwrite, then it could skip the reading/writing which is probably the majority of the time.

@bitterbark What you are describing sounds like a situation where you run the framework with a specific POD/dataset combination with the --keep-temp flag (and --disable-preprocessor flag if you don't want any of the variable name/metadata modificdations) enabled to retain the local copies of the files. Next, you want to re-run the same configuration using the saved local files. With a new-flag (e.g., --no-file-rw), the framework would have to search for any pre-existing working directories with the desired CASENAME for saved files. If it finds the files, it sets the file path and variable name environment variables to point to the old wk_dir, and skips the preprocessing. If it doesn't find those files, it would run the standard configuration with the preprocessor.

This differs somewhat from what Yihung described, which was a step to bypass any preprocessing and simply softlink directly to the desired output in MODEL_ROOT_DIR and OBS_ROOT_DIR. I discussed this with @aradhakrishnanGFDL, and she reminded me that the preprocessed data check was proposed some time ago as part of the framework redesign. She also suggests that this would be a good use case for incorporating intake_esm catalogs.

What I'm going to do is open a separate issue for this preprocessed file checking capability with an eye toward making this the next new feature, possibly in conjunction with the overall preprocessor redesing.