Read scenario variant data for multiple timesteps

Question

Read scenario variant data for multiple timesteps

Closed this issue 5 years ago · 6 comments

The current store.read_scenario_variant_data was designed for reading scenario variant data for a single timestep.
If the method is called without a timestep argument, a SmithDataMismatchError is raised (see issue #370 ).
This method was actually not designed to return the data for multiple timesteps.

We therefore want an additional method that returns a dataArray object, or a pandas DataFrame, containing the whole scenario variant data, i.e. the data for all timesteps.

This function will be useful to

plot the scenario data (see issue nismod/nismod2#105)
convert the scenario variant data from one format to another #319

scenario_name = 'population'
variant_name = 'population_high'
variable = 'population_density'
timesteps = [2010, 2020]
# The function in the Store class returns a DataArray
da = store.read_scenario_variant_data_multiple_timesteps(scenario_name, variant_name, variable, timesteps)

# Wrapper within results API that returns a pandas DataFrame
df = results.get_scenario_data(scenario, variant_name, variable, timesteps)

Modify current store.read_scenario_variant_data() to only accept one timestep. That fixes #370
Implement read_scenario_variant_data_multiple_timesteps at the level of the store
Implement wrapper get_scenario_data for Results API

Answer 1 · 2019-05-14T09:18:17.000Z

Just to clarify:

Results.get_scenario_data() should accept a list of one or more timesteps
If any of the requested timesteps do not exist in the scenario data, raise a SmifDataNotFoundError

Answer 2 · 2019-05-14T12:00:57.000Z

store.read_scenario_variant_data_multiplt_timesteps returns a DataArray containing the spec and data for scenario output data.
It is very similar to store._get_result_darray_internal() that is used to read model outputs.

store.read_scenario_variant_data_multiple_timesteps builds on store.read_scenario_variant_data() that reads scenario output for a unique timestep, on the same way that store._get_result_darray_internal() builds on store.read_results().

Example:

$ cat test_read_scenario_data.py 
scenario_name = 'population'
variant_name = 'population_high'
variable = 'population'
timesteps = [2010,2020]
results_darray = store.read_scenario_variant_data_multiple_timesteps(scenario_name, variant_name, variable, timesteps)

print(results_darray.as_df())
print('###################')

# Requesting an invalid testing causes store.read_scenario_variant_data to raise a
# SmifDataNotFoundError
timesteps = [2040]
results_darray = store.read_scenario_variant_data_multiple_timesteps(scenario_name, variant_name, variable, timesteps)

>>> python test_read_scenario_data.py
                    population
country  timesteps            
Scotland 2010          5100000
         2020          5500000
England  2010         52000000
         2020         54000000
Wales    2010          2900000
         2020          3200000
###################
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    variable, timesteps)
  File "/home/tlestang/projects/dl_smif/smif/src/smif/data_layer/store.py", line 918, in read_scenario_variant_data_multiple_timesteps
    ...
smif.exception.SmifDataNotFoundError: Data for 'population' not found for timestep 2040

Answer 3 · 2019-05-14T13:10:01.000Z

@willu47 hit an bit of interest on this one. The store methods that read and write scenario variant data are tested (with some fixtures) like so:

# write
store.write_scenario_variant_data(
    scenario_name, variant_name, scenario_variant_data
)
# read
actual = store.read_scenario_variant_data(
    scenario_name, variant_name, variable
)
assert actual == scenario_variant_data

So, here, timestep is omitted because the underling data spec doesn't have a timestep. I'm assuming that there in fact will always be a timestep dimension in scenario variant data?

If that's right, then I'll try and change the fixtures to reflect that, and enforce the existence of a time step. Does that make sense?

Answer 4 · 2019-05-14T13:17:14.000Z

Hi @fcooper8472 - thanks for catching this. Yes, there should always be a timestep associated with scenario data.

Answer 5 · 2019-05-14T15:00:55.000Z

Just a remark: the function must be given a list of timesteps as an argument.
This is because, in order to read the data, one must be able to construct the corresponding Spec object, and therefore have knowledge of the timesteps.
A scenario itself does not define timesteps.

To get the scenario data for all the timesteps in a model run:

model_run = store.read_model_run(model_run_name)
read_scenario_data(self, scenario_name, variant_name, variable_name,
                           model_run['timesteps'])

Answer 6 · 2019-05-14T15:37:24.000Z

Closed by #375