ESMValGroup/ESMValCore

~/.esmvaltool/config-user.yml is always read even when --config-file is specified

Closed this issue · 6 comments

Describe the bug

If a user specified a custom user configuration file via --config-file, the default user configuration file at ~/.esmvaltool/config-user.yml is still read. This can lead to very confusing results, e.g., if an invalid option is present in ~/.esmvaltool/config-user.yml but not in the user-defined file, there will be an error which is very hard to solve for users without a deep knowledge of the tool.

It would be really nice if only the user-defined configuration file is read.

I guess this is a side effect of how the configuration is provided for the (experimental) API. @ESMValGroup/technical-lead-development-team does anyone have an idea how to solve this in a backwards-compatible way without too many changes? I can't think of an easy solution to this at the moment, so this might be something to consider when we rewrite our configuration file/code (see #795).

Do you think many people depend on the current behaviour? It may be fine to just change this and add a note that this is backward incompatible in the release notes.

To make it easier to debug the issue, you could expand the text of the InvalidConfigParameter exception messages raised here

def __setitem__(self, key, val):
"""Map key to value."""
if key not in self._validate:
raise InvalidConfigParameter(
f"`{key}` is not a valid config parameter."
)
try:
cval = self._validate[key](val)
except ValidationError as verr:
raise InvalidConfigParameter(f"Key `{key}`: {verr}") from None
if key in self._deprecate:
self._deprecate[key](self, val, cval)
self._mapping[key] = cval

to include the self['filename'] that the configuration was loaded from (if present).

On second thought, that may not be very accurate if filename was set earlier and then you're making a mistake when updating from the API. Maybe add a try InvalidConfigParameter/except around the bits of code that actually update the configuration from file, e.g. here:

Is this a duplicate of the issue I reported in #2113? In this issue I describe a use case for fixing this 😊

I think the main problem here is that the default user config file ~/.esmvaltool/config-user.yml is always read when importing esmvalcore.config:

CFG = Config._load_user_config(USER_CONFIG, raise_exception=False)

This makes sense when you use ESMValCore within a Jupyter notebook, since you can easily do a from esmvalcore.config import CFG and get your desired configuration. However, since esmvalcore.config is imported dozens of times in our code base, running the tool will always load ~/.esmvaltool/config-user.yml even when the user explicitly selected a different config file.

The only practical solution I can think of right now is to somehow make the esmvalcore.config module aware of the user-defined configuration here:

USER_CONFIG_DIR = Path.home() / '.esmvaltool'
USER_CONFIG = USER_CONFIG_DIR / 'config-user.yml'

The relevant (=first) import of that module happens in our main function, which is actually knows of the user defined config location:

USER_CONFIG_DIR = Path.home() / '.esmvaltool'
USER_CONFIG = USER_CONFIG_DIR / 'config-user.yml'

How would we pass the user defined location to the module? Maybe with an environment variable similar to dask?

https://github.com/dask/dask/blob/d45cf2a1e66d8645c6a594d90b180012d3fc62c6/dask/config.py#L42C1-L46C68

This could also be useful useful in the Jupyter notebook setting, where you could be specify a custom config file BEFORE loading the config file.

I really like the idea of an environmental variable as it's done in Dask, but could you please make it point to the directory with configuration files instead of just the config-user.yml file? As discussed at the last workshop, we're planning to organize our configuration more like it's done for Dask (docs) in the future, so it would be nice if we don't end up having to deprecate the feature because we want to change it. Even now there are other files to consider besides config-user.yml, i.e. dask.yml and esgf-pyclient.yml. Would ESMVALTOOL_CONFIG, with a default value of ~/.esmvaltool work? Or would no default be even better, considering this issue?

but could you please make it point to the directory with configuration files instead of just the config-user.yml file?

Sure!

Would ESMVALTOOL_CONFIG, with a default value of ~/.esmvaltool work? Or would no default be even better, considering this issue?

I think it's good to have a default. If people call esmvaltool run without a --config-file the default file at ~/.esmvaltool should be used if it's present. This is done in Dask, matplotlib, every shell I know, etc., so it's probably straightforward if we do that, too. The proplem described in this issue is that the file in ~/.esmvaltool is ALWAYS read, even when a custom --config-file is specified.