Allow Glob patterns when specifying the model files in definition files
Closed this issue ยท 8 comments
Hi, I see you made some fundamental changes in handling the model parameters in the last few months. I principle I like these changes. As I'm not a big fan of hard coded paths (even if they are just subpaths under tide_dir
) I really like the possibility to use a definition file as an alternative. Here I have one suggestion to make the use of definition files easier.
In the current version under model_file
the user needs to specify a long list of each single constituent file. I would suggest to allow to use a Glob pattern here as well. So instead of
model_file ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/2n2.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/eps2.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/j1.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/k1.nc, ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/k2.nc, ... + ~30 more files
one could simply write
model_file ~/data/TIDE_MODEL/fes/fes2014_elevations_and_load/fes2014b_elevations/*.nc
To allow this (and at the same time allow the old behavior as well) I suggest to import glob
and modify l.1518ff in model.py
as follows:
model_files = []
for pattern in re.split(r'[\s\,]+',temp.model_file):
for model_file1 in sorted(glob.glob(os.path.expanduser(pattern))):
model_files.append(model_file1)
temp.model_file = model_files
Furthermore, there is no need to search for the separator r'[\s\,]+'
and make a difference between the cases. If no separator is found, the loop runs only once.
BTW: In l.1470 in model.py
the print(test)
was surely ment for test purposes and can be removed now. ;)
Thanks @DerLude, this is a good idea I just need to think about how I want to implement it.
The print test bug is fixed in the dev
branch. Hopefully I finish what I'm working on and can get that bug squashed. ๐คฆ
ok. the fix is in #186.
still need to think about how to implement the glob strings
ok. I'm working on implementing the ability to use glob patterns in the latest dev
branch. Going to refactor a bit of the code base to use pathlib
to define, expand and operate on paths. This'll be for the next release 2.0.4
.
I've noticed another issue with model definition files which is somehow connected to this issue (therefore I'm just adding this point here).
For OTIS/ATLAS/GOT-type models, the respective extract_constants
function returns a list of constituents names. This is not the case for FES-type models. Therefore, you define model.constituents
manually in model.py for FES-type models. However, if the model is defined by definition file, model.constituents
is None
and pyTMD.predict.drift
fails.
I think one possibility would be to define the constituents list in the definition file as well (but this would conflict with the Glob patterns). I suggest another way of handling this:
For the other type of models, you return the constituents directly from extract_constants
. Maybe this could be adapted for FES-type models as well. As the constituent name is not saved in the file (e.g. as a nc-attribute), it needs to be extracted from the filename. These are:
- for FES:
'2n2.nc','eps2.nc','j1.nc',...
- for EOT:
'2N2_ocean_eot20.nc','J1_ocean_eot20.nc','K1_ocean_eot20.nc','K2_ocean_eot20.nc',...
- for HAMTIDE:
'2n.hamtide11a.nc','k1.hamtide11a.nc','k2.hamtide11a.nc','m2.hamtide11a.nc','n2.hamtide11a.nc',...
For FES and EOT, this does the job for me:
if(model.version=="FES2014"):
model.constituents = [pathlib.PurePath(file_now).stem for file_now in model.model_file]
elif(model.version=="EOT20"):
model.constituents = [pathlib.PurePath(file_now).stem.replace("_ocean_eot20","").lower() for file_now in model.model_file]
for i,cons in enumerate(model.constituents):
if(cons=="la2"): model.constituents[i] = "lambda2"
Maybe this could be integrated in extract_constants
in FES.py
? For the sake of robustness, you could allow to still define the constituents manually but if they are None
, those extracted from the filenames will be used.
BTW thanks for the hint with pathlib
! Especially stem
is very nice, as it avoids the need for sebsequent calls of os.path.basename
and os.path.splitext
;)
Alright. I think I have this all working now in main. The constituents parsing bit did require some remapping functionality for the lambda2 case as you said (there's also a 2n2 case for HAMTIDE models).
To use globs with definition files:
- Define a base path in
io.model
to search for the files - Set the glob string as
model_file
in your model definition file. This can include subdirectories if wanting to use a set base directory for all tide models.
For FES2014 elevations, this would look like
model_file fes2014/ocean_tide/*.nc.gz
For FES2014 currents, this would look like
model_file fes2014/eastward_velocity/*.nc.gz;fes2014/northward_velocity/*.nc.gz
The python call for reading the definition file would look like
model = pyTMD.io.model(DIRECTORY).from_file(DEFINITION_FILE)
where DIRECTORY
is not None
. Having DIRECTORY
be None
uses fully defined paths in the definition file (as it was previously for all definition file cases).
Happy to work on this more, but I figured I could get this initial capability out for you to tinker with! Let me know what you think! :)
Amazing how fast you implemented that! It works almost like a charm but I have two remaining suggestions:
- If the globbing fails, either in case of typos or if the user uses
DIRECTORY
+ full path in theDEFINITION_FILE
(as I did at first :) ) no files are found and you get this error message:
File "pyTMD/io/model.py", line 1358, in from_file
temp.model_directory = temp.model_file[0].parent
IndexError: list index out of range
Maybe you could check this case and send an error message that makes it easier to understand whats going on. I suggest:
globstr = temp.directory.joinpath(temp.model_file) # need to save this as temp.model_file gets overwritten
temp.model_file = list(temp.directory.glob(temp.model_file))
if(len(temp.model_file)<1): print("No model files found for "+str(globstr)); exit()
- For consistency, prepending the
DIRECTORY
to the files specified in theDEFINITION_FILE
should be done for thegrid_file
(e.g. for TPXO) as well.
I like those changes! They're in #190
Perfect! Thanks!