nismod/smif

Parent issue for results API

fcooper8472 opened this issue · 3 comments

Goal

Improve the ability to query results from model runs within smif by developing a suitable API.

Problem

  • At present, results are stored using the file store in a logical, but granular, folder structure
  • Results can be accessed using the methods in smif.data_layer.store such as
    • read_results, write_results, available_results
  • The aim of this parent issue is to provide more usable access to the results. Practically, this involves accessing, exposing, collating, aggregating and filtering the results and serving them to the user

Core functionality

  • Find out for which model runs results are available #351

  • Check that the results for a model run are complete, and if not, which are missing #352

  • Programmatically query the available model results (#359) across various levels in the hierarchy of

    • modelrun
    • timestep <- these are defined in a model run
    • decision_iteration <- iterations exist or not depending on the decision module and may change from run to run. Note also that the numbers of iterations per timestep may change.
    • model_name <- this is a model within a system of systems
    • output_name <- these are defined in the model_name configuration
    • dimensions within the output_name <- these are defined in an output's Spec, also in config
  • Users should be able to fix one or more of the above levels and receive a multi-dimensional array of data that represents the unfixed data. For example given:

modelruns: ['first_model_run', 'test_model_run', 'ensemble_0000', 'ensemble_0001', 'ensemble_0002']
timesteps: [2010, 2015, 2020]
iterations: [0]

models

name: 'water_supply'
outputs:
- name: cost
  dims:
  - local_authority_districts
  dtype: float
  unit: million GBP
- name: energy_demand
  dims:
  - local_authority_districts
  dtype: float
  unit: kWh

Something like:

  • get_results(timestep=[2010], models=['water_supply']) should return results for both outputs (cost and energy_demand) for the year 2010 for the single iteration 0
  • get_results(models['water_supply'], outputs=['cost']) should return a timeseries of costs for the output cost for the single iteration 0

@willu47 question about the proposed get_results behaviour:

Let's say I supply models=['water_supply'], but water_supply is run in a number of different model runs... are you thinking that get_results only gets results for a specific model run?

We need both cases - we should add model run as a result dimension above. However... it does not make sense to compare results from model runs that contain a different system-of-systems, where a model can exist in multiple system-of-systems.

Closed by #367