tsutterley/pyTMD

Allow Glob patterns when specifying the model files in definition files

Closed this issue ยท 8 comments

Hi, I see you made some fundamental changes in handling the model parameters in the last few months. I principle I like these changes. As I'm not a big fan of hard coded paths (even if they are just subpaths under tide_dir) I really like the possibility to use a definition file as an alternative. Here I have one suggestion to make the use of definition files easier.
In the current version under model_file the user needs to specify a long list of each single constituent file. I would suggest to allow to use a Glob pattern here as well. So instead of

model_file ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/2n2.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/eps2.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/j1.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/k1.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/k2.nc, ... + ~30 more files 

one could simply write

model_file ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/*.nc

To allow this (and at the same time allow the old behavior as well) I suggest to import glob and modify l.1518ff in model.py as follows:

        model_files = []
        for pattern in re.split(r'[\s\,]+',temp.model_file):
            for model_file1 in sorted(glob.glob(os.path.expanduser(pattern))):
                model_files.append(model_file1)
        temp.model_file = model_files

Furthermore, there is no need to search for the separator r'[\s\,]+' and make a difference between the cases. If no separator is found, the loop runs only once.

BTW: In l.1470 in model.py the print(test) was surely ment for test purposes and can be removed now. ;)

Thanks @DerLude, this is a good idea I just need to think about how I want to implement it.
The print test bug is fixed in the dev branch. Hopefully I finish what I'm working on and can get that bug squashed. ๐Ÿคฆ

ok. the fix is in #186.
still need to think about how to implement the glob strings

ok. I'm working on implementing the ability to use glob patterns in the latest dev branch. Going to refactor a bit of the code base to use pathlib to define, expand and operate on paths. This'll be for the next release 2.0.4.

I've noticed another issue with model definition files which is somehow connected to this issue (therefore I'm just adding this point here).
For OTIS/ATLAS/GOT-type models, the respective extract_constants function returns a list of constituents names. This is not the case for FES-type models. Therefore, you define model.constituents manually in model.py for FES-type models. However, if the model is defined by definition file, model.constituents is None and pyTMD.predict.drift fails.
I think one possibility would be to define the constituents list in the definition file as well (but this would conflict with the Glob patterns). I suggest another way of handling this:
For the other type of models, you return the constituents directly from extract_constants. Maybe this could be adapted for FES-type models as well. As the constituent name is not saved in the file (e.g. as a nc-attribute), it needs to be extracted from the filename. These are:

  • for FES: '2n2.nc','eps2.nc','j1.nc',...
  • for EOT: '2N2_ocean_eot20.nc','J1_ocean_eot20.nc','K1_ocean_eot20.nc','K2_ocean_eot20.nc',...
  • for HAMTIDE: '2n.hamtide11a.nc','k1.hamtide11a.nc','k2.hamtide11a.nc','m2.hamtide11a.nc','n2.hamtide11a.nc',...
    For FES and EOT, this does the job for me:
        if(model.version=="FES2014"):
            model.constituents = [pathlib.PurePath(file_now).stem for file_now in model.model_file]
        elif(model.version=="EOT20"):
            model.constituents = [pathlib.PurePath(file_now).stem.replace("_ocean_eot20","").lower() for file_now in model.model_file]
        for i,cons in enumerate(model.constituents):
            if(cons=="la2"): model.constituents[i] = "lambda2"

Maybe this could be integrated in extract_constants in FES.py? For the sake of robustness, you could allow to still define the constituents manually but if they are None, those extracted from the filenames will be used.

BTW thanks for the hint with pathlib! Especially stem is very nice, as it avoids the need for sebsequent calls of os.path.basename and os.path.splitext ;)

Alright. I think I have this all working now in main. The constituents parsing bit did require some remapping functionality for the lambda2 case as you said (there's also a 2n2 case for HAMTIDE models).

To use globs with definition files:

  1. Define a base path in io.model to search for the files
  2. Set the glob string as model_file in your model definition file. This can include subdirectories if wanting to use a set base directory for all tide models.

For FES2014 elevations, this would look like

model_file    fes2014/ocean_tide/*.nc.gz

For FES2014 currents, this would look like

model_file    fes2014/eastward_velocity/*.nc.gz;fes2014/northward_velocity/*.nc.gz

The python call for reading the definition file would look like

model = pyTMD.io.model(DIRECTORY).from_file(DEFINITION_FILE)

where DIRECTORY is not None. Having DIRECTORY be None uses fully defined paths in the definition file (as it was previously for all definition file cases).

Happy to work on this more, but I figured I could get this initial capability out for you to tinker with! Let me know what you think! :)

Amazing how fast you implemented that! It works almost like a charm but I have two remaining suggestions:

  1. If the globbing fails, either in case of typos or if the user uses DIRECTORY + full path in the DEFINITION_FILE (as I did at first :) ) no files are found and you get this error message:
  File "pyTMD/io/model.py", line 1358, in from_file
    temp.model_directory = temp.model_file[0].parent
IndexError: list index out of range

Maybe you could check this case and send an error message that makes it easier to understand whats going on. I suggest:

                globstr = temp.directory.joinpath(temp.model_file)    # need to save this as temp.model_file gets overwritten
                temp.model_file = list(temp.directory.glob(temp.model_file))
                if(len(temp.model_file)<1): print("No model files found for "+str(globstr)); exit()
  1. For consistency, prepending the DIRECTORY to the files specified in the DEFINITION_FILE should be done for the grid_file (e.g. for TPXO) as well.

I like those changes! They're in #190

Perfect! Thanks!