nismod/smif

Write a script which generates one scenario variant for each replicate in the ensemble dataset

Closed this issue · 3 comments

Child of #353

The first step is to format the data into the correct format.

  • Using the current file store, one file should be generated per variant - so ensemble replicate in this case, with one column per output, and a column for year
  • The files should be named with the replicate identifier - and then added to scenario configuration with a meaningful variant name

When generating a batch model run (see issue #363), each model run in the ensemble will include data from the relevant variant in the scenario.

I'd like to summarize my understanding of the issue, because it is not totally clear to me yet.

Let the ensemble dataset be composed of 99 .csv files weather_at_home/temperature_energy_demand/t_max__NF[1-99].csv

$ head -10 t_max__NF1.csv
region,t_max,timestep,yearday
E06000001,13.2,2015,0
E06000001,11.8,2015,1
E06000001,5.7,2015,2
E06000001,5.0,2015,3
E06000001,8.5,2015,4
E06000001,7.6,2015,5
E06000001,10.1,2015,6
E06000001,9.0,2015,7
E06000001,12.8,2015,8

each file describing a specific scenario variant for the output temperature_max of scenario weather_at_home.

Prior to the execution of a model run with timesteps [2015,2020,2025], we aim at generating a new ensemble of 99 .csv files of the type

$ cat data/scenarios/tmax_01.csv
timestep,region,t_max
2015,E06000001,x
2015,E06000002,x
...
2020,E06000001,x
2020,E06000002,x
...

that feed into a scenario of the type

$ cat weather_at_home.yml
name: weather_at_home
description: The weather over the UK
provides:
  - name: temp_min
   ...
  - name: temp_max
    ...
  - name: solar_radiation
    ...
  - name: wind_speed
    ...
variants:
  - name: replicate_01
    description: 
    data:
      temp_min: t_min_01.csv
      temp_max: t_max_01.csv
      solar_radiation: rsds_01.csv
      wind_speed: wss_01.csv

   - name: replicate_02
     ...

Each of these variants is then involved in a specific model run file of the type

$ cat my_model_run_01.yml
name: my_model_run_01
description:  Energy demand under weather_at_home scenario, variant 01 of 99
stamp: "2017-09-18T12:53:23+00:00"
timesteps:
- 2020
- 2020
- 2025
sos_model: my_sos_model
scenarios:
  ...
  weather_at_home: replicate_01
  ...
narratives: {}
<snip>

We would then have to reduce the data in the ensemble dataset (which come at a one-day resolution) to a per-year value. Max of temp_max over the year for the maximum temperature for instance ?

Hi Thibault. The dataset in weather_at_home/temperature_energy_demand/t_max__NF[1-99].csv is in the correct format, so no need to reshape the files. We (@eggimasv @tomalrussell) have developed a script to get the weather@home data into the csv format that smif requires. Otherwise, your terminology is spot on - each numbered file contains one scenario variant. Each column in the file represents a scenario output. And the scenario is weather@home for RCP 8.5.

And the rest of your description is correct, except for the need for a reduction. The one-day resolution data is passed into the system of systems as is, there is no extra processing required.

Closed by #381