eWaterCycle/ewatercycle

Simplify adding models

Closed this issue · 6 comments

Currently, adding a model to eWaterCycle is cumbersome:

  • eWaterCycle-specific model code lives inside the eWaterCycle repo
  • Different aspects of models (setup, forcing data, etc.) live in different modules (ewatercycle.models, ewatercycle.forcing, ...)
  • Models, parameter sets, and forcing data all have versions that are supposed to match

Our current structure is as follows:

- ewatercycle 
  - models
    - modelA
    - modelB
    - ...
  - forcing
    - modelA
    - modelB
    - ...
  - ...

Alternatively, we could use something like

ewatercycle
  - modelA
    - model
    - forcing
    - ...
  - modelB
    - model
    - forcing
    - ...
  - ...

Then, all model-specific quirks are contained within one folder/module/repo. This would make it easier to add new models. It might also make sense to move all specific model implementations out of the main eWaterCycle package and support them as plugins instead. Such a structure would facilitate that:

  • A new release of the plugin has to ensure that model code, forcing, and parameter sets are compatible
  • Using a specific version of a model is as simple as changing the version of the plugin
  • Ownership of specific models/plugins lies with the model/plugin developer
  • The main eWaterCycle package would reduce to simply defining the interface, plus perhaps one or two "defaults" (lumped and distributed)

This requires some work on the package architecture:

  • Enable plugins (see #336, #335, #340)
  • Change package structure (#341, #347)
  • Make a default model (#354, #360, #359)
  • Make default forcing (also see #337)
  • Move model-specific implementations out of eWaterCycle

Another feature that would be very helpful for development purposes is enabling models without containers. We could make a distinction between a LocalModel and a ContainerizedModel. Both would have to adhere to the eWaterCycle model interface, i.e. having a setup function that attaches BMI (even if this is not strictly necessary for the localmodel). As such, the ewatercycle localmodel is an intermediate step between a 'normal' BMI model and a containerized ewatercycle model.

Different constructors.

Currently our models are initialized with a version tag. While this is good for checking compatibility between model, forcing, and parameterset, it would also be very useful to be able to start a model directly from a container URI or image filename. One way to approach this is by having multiple constructors:

ContainerizedModel.from_version(version="2020.10", ...)

# One option is separate constructors for docker/apptainer
ContainerizedModel.from_docker(docker_uri="ewatercycle/wflow-grpc4bmi:2020.10", ...)
ContainerizedModel.from_apptainer(sif_file="wflow_grpc4bmi_2020-10.sif") 

# Or reconstruct from docker uri:
ContainerizedModel.from_image(image = "ewatercycle/wflow-grpc4bmi:2020.10")
# if config.container_engine == singularity: derive image filename from image

The public API consists out of

  • Generate forcing for a model
  • Load forcing for a model
  • Load parameter set for a model
  • Run a model with parameter set and forcing
  • List available models
  • List available parameter sets
  • Download example parameter set for a model
  • Non model specific public API:
    • Download observation data from GRDC or USGS
    • Configuration for container engine, root dir for parameter sets

By moving to a plugin architecture the public API can be refactored.
Some choices:

  1. Each model has own public API,
  • ewatercycle public API does not know about models
  • ewatercycle public API is used to construct model
  1. A model specific thing can be made available via the ewatercycle public API
  2. Each model has own public API and is reexported in the ewatercycle public API

All tasks have been completed. Models are much simpler to add now.