nismod/smif

Add `smif prepare` command line argument for variability

willu47 opened this issue · 10 comments

  • Add the subcommand smif prepare and arguments [modelrun_template] [scenario_name] [variant range], with some useful helpers for specifying a range of scenario variant names like --start weather_001 --end weather_100
  • Connect up to functionality from #363

Within the variability branch, the smif command line interface now has a prepare option.
smif prepare takes three arguments:

  1. A model run name (Ex: energy_central)
  2. A scenario name (Ex: weather_at_home)
  3. Two integers indicating the lower and upper bound of the variants range (Ex: 3 6)

What does smif prepare energy_central weather_at_home 3 6 do?

  1. It looks in config/scenarios/ for a template file weather_at_home_template.yml and generate a new weather_at_home.yml containing variants replicate 3, replicate 4,...,replicate 6.
  2. It reads config/model_runs/energy_central.yml and generate energy_central_3.yml,energy_central_4.yml,...,energy_central_6.yml. For each of these the scenario variant for weather_at_home in the original energy_central.yml is replace by the corresponding replicate.

The scenario template weather_at_home_template.yml must be provided as follows

$ cat weather_at_home_template.yml
name: weather_at_home
description: UK weather
provides:
- name: maximum_temperature
 ...

variants:
- name: my_dummy_variant
  description: ''
  data:
    maximum_temperature: t_max__NF.csv
    minimum_temperature: t_min__NF.csv
    average_radiation: rsds__NF.csv
    average_wind_speed: wss__NF.csv

In the resulting weather_at_home.yml, variant x will be defined by

...
- name: replicate_x
  description: ''
  data:
  maximum_temperature: t_max__NFx.csv
  minimum_temperature: t_min__NFx.csv
  average_radiation: rsds__NFx.csv
  average_wind_speed: wss__NFx.csv
...

At the moment, the scripts of issues #362 and #363 (write_scenario_variants.py and write_variant_model_runs_from_template.py) have been turned into functions in smif/cli/__init__.py

TODOS

  • Sanity cheks (Do the prescribed scenario and model runs exist ?)

  • Enhance the way the range of variants is precribed, by implementing helpers helpers for specifying a range of scenario variant names like --start weather_001 --end weather_100 (see issue)

Hi @tlestang - I was able to use the functionality you described and the sample project.

I think we should split the functionality into two separate pieces:

  1. generate the scenario variants in the config file
  2. generate the model runs from a template model run referencing a scenario with multiple variants

Note that the latter would also provide us with the ability to programmatically generate all model runs for the variants in a scenario - that's a bonus, even if the source scenario isn't one that is inherently about variability.

Hi @willu47 and @tomalrussell ,
I split the smif prepare command into two commands: smif prepare-scenario and smif prepare-run.
Examples:

  • smif prepare-scenario weather_at_home 0 99
    looks for a file config/scenarios/weather_at_home_template.yml where the shape of the data files name is specified. For instance:
$ cat /config/scenarios/weather_at_home_template.yml
...
variants:
- name: my_dummy_variant
  description: ''"
  data:
    maximum_temperature: t_max__NF.csv
    minimum_temperature: t_min__NF.csv
    average_radiation: rsds__NF.csv
    average_wind_speed: wss__NF.csv

The resulting weather_at_home.yml specifies 100 variants, for instance

....
- name: weather_at_home_56
  description: ''"
  data:
    maximum_temperature: t_max__NF56.csv
    minimum_temperature: t_min__NF56.csv
    average_radiation: rsds__NF56.csv
    average_wind_speed: wss__NF56.csv
  • smif prepare-run energy_central weather_at_home will now generate multiple energy_central_X.yml model runs, one for each variant specified in the weather_at_home.yml config file (100 in this example).

  • One can restrict the range of variants:
    smif prepare-run energy_central weather_at_home --start=0 --end=9 will only create 10 model runs referencing the first 10 variants specified in weather_at_home.yml.

Remark 1 Variants are therefore specified by
their position in the scenario file. Another approach would be to specify a particular variant, for instance
--start=weather_at_home_56,10 to select replicates 56 to 66. It could also be possible to get non-consecutive variants: --variants=population_low,population_high... or specified in a file ? --variants=path_to_file with

$ cat path_to_file
weather_at_home_2
weather_at_home_74
weather_at_home_67
...

Let me know if you think those features are worth implementing now.
Remark 2 I just realize I forgot to pad the indices (_1 --> _001). Will do :)

Hi @tlestang - this looks promising, splitting the command is a useful improvement.

Had a quick look at the implementation - do you think it can be reworked only to use the store (and not to touch the filesystem directly)? If so, it would be good to maintain the separation.

Remark 1 - these features all sound surplus to requirements; can be covered by running smif prepare-run a few times as you describe it above.

What does smif prepare-run energy_central weather_at_home do with other scenarios in the model run which have variants?

Remark 2 - Sounds like a good default. I started thinking about passing in a format string then wondered if smif prepare-scenario would be simpler and more flexible with a single argument for the variant string - what do you reckon?

$ smif prepare-scenario weather_at_home <template_value>
$ smif prepare-scenario weather_at_home 027
$ seq -f "NF%02g" 1 10 | xargs -n 1  smif prepare-scenario weather_at_home $1

What does smif prepare-run energy_central weather_at_home do with other scenarios in the model run which have variants?

It should do nothing. This command considers only variants of the scenario that is passer as an argument.

wondered if smif prepare-scenario would be simpler and more flexible with a single argument for the variant string

Do you mean 1 call to prepare-scenario for 1 variant ? So it is up to the user to iterate over the variants ?

@tomalrussell I modified prepare_scenario and prepare_run to use only the store
and not the underlying file store. I think I did not fully grasp how and why the file store is encapsulated into the upper level Store class... sorry about that!
However, some of the checks for existing config files have been removed because
they relied on methods of the file store. I detail the changes below.

  • prepare_scenario

    • Old behavior
      The scenario_template.yml file is renamed into scenario.yml first.
      This ia required before updating the scenario using update_scenario_variants and
      add_scenario_variant.
      However, if scenario.yml is found in the config/scenarios/ dir., the user is warned and
      asked for permission to overwrite scenario.yml.

    • New behavior
      The scenario file scenario.yml is directly edited and a scenario_template.yml file
      is no longer required.
      A smif exception is raised if scenario file is not found (from the file store).
      One potential caveat is the call of prepare_scenario on a scenario file that is not
      a template file, but an existing well defined scenario file. In this case the scenario file would be altered and would be probably contain rubbish.
      As a safeguard, prepare_scenario checks that the prescribed scenario file does not already define
      multiple variants.

  • prepare_run

    • Old behavior
      If the model run file model_run_X.yml for variant X already exists, warns the user and
      ask permission to user to overwrite. If yes, the mdel run file is erased.
      Because write_model_run does not overwrite the model run file, but instead raises
      a SmifDataExists exception.

    • New behavior
      If one of the variant model runs model_run_X.yml exists already, a SmifDataExists exception is
      raised and program is stopped. Exception is raised from the file store.

Hey @tlestang that all sounds reasonable. If you're happy with this, go ahead and open a pull request.

While testing the store.prepare_scenario() method I noticed that some tests would fail when using the memory interface, although the functionality works fine using the file store.

MemoryConfigStore.update_scenario_variant() does not modify the key by which the updated variant is indexed in the attribute _scenarios[scenario_name]['variants'] dictionary.
So if the initial name of the variant is 'low' for scenario 'mortality' and one uses update_scenario_variant(scenario_name, 'low', updated_variant) with updated_variant['name']='new_name' the new variant is still indexed by 'low':

>>> print(store.config_store._scenarios['mortatlity']['variants']['low']['name'])
'new_name'

I think this remark is only relevant for the memory interface, in which the configuration file structure is mocked using dictionaries.

As a consequence, if you use update_scenario_variant(scenario_name, variant_name, variant) or write_scenario_variant(scenario_name, variant), then modifying the variant dictionary afterwards will also modify the underlying MemoryConfigStore._scenarios dictionary, since assignments in Python do not copy objects.. (I still have a hard time with that!)
On the other hand, using the file store, modifying the variant dict. will have no effect on the scenario config file itself.

For instance say you want to add two variants based on the 'mortality' scenario variant 'low'.

# Get the variant dict. for template variant 'low'
variant = self.read_scenario_variant('mortatlity', 'low')
for ivar in [1,3]:
           variant['name'] = 'mortatlity_additional_variant_{:03d}'.format(ivar)
           self.write_scenario_variant(scenario_name, variant)

Using the YamlConfigStore, mortality.yml will contain 3 variants low, mortality_additional_variant_001 and mortality_additional_variant_002. That is the expected behavior.
However using the memory interface, after the loop the additional variants added to the scenario 'mortality' will have the same name mortality_additional_variant_002.

To recover the expected behavior with the memory interface, it suffices to copy the variant dict

# Get the variant dict. for template variant 'low'
variant = self.read_scenario_variant('mortatlity', 'low')
for ivar in [1,3]:
           variant_copy = deepcopy(variant)
           variant_copy['name'] = 'mortality_additional_variant_{:03d}'.format(ivar)
           self.write_scenario_variant(scenario_name, variant_copy)

However, the test would pass fine without the copy if using the file store, which is the relevant implementation of the Store in practice. The copy is therefore only required for the testing using the memory interface...

Closed by #381