/h2ox-ai

This repo is for training machine learning models for the wave2web hackathon

Primary LanguageJupyter NotebookMIT LicenseMIT

Wave2Web Hack

H2Ox is a team of Oxford University PhD students and researchers who won first prize in theWave2Web Hackathon, September 2021, organised by the World Resources Institute and sponsored by Microsoft and Blackrock. In the Wave2Web hackathon, teams competed to predict reservoir levels in four reservoirs in the Kaveri basin West of Bangaluru: Kabini, Krishnaraja Sagar, Harangi, and Hemavathy. H2Ox used sequence-to-sequence models with meterological and forecast forcing data to predict reservoir levels up to 90 days in the future.

The H2Ox dashboard can be found at https://h2ox.org. The data API can be accessed at https://api.h2ox.org. All code and repos can be https://github.com/H2Oxford. Our Prototype Submission Slides are here. The H2Ox team is Lucas Kruitwagen, Chris Arderne, Tommy Lees, and Lisa Thalheimer.

H2Ox - AI

This repo is for training and deploying the winning models of the h2ox team for the wave2web hackathon. In total 11 models were trained, for 10 different basin networks, plus one 'extras' with singleton reservoirs.

Installation

This repo can be pip installed:

pip install https://github.com/H2Oxford/h2ox-ai.git

For development, the repo can be pip installed with the -e flag and [dev] options:

git clone https://github.com/H2Oxford/h2ox-ai.git
cd h2ox-ai
pip install -e .[dev]

Don't forget to install the pre-commit configuration for nice tidy code!

pre-commit install

For containerised deployment, a docker container can be built from this repo:

docker build -t <my-tag> .

Cloudbuild container registery services can also be targeted at forks of this repository.

Description

Machine Learning Problem

The machine learning problem being approached in this reservoir is the prediction of reservoir volume changes up to 90 days in the future for sixy-six reservoirs across ten river basin networks in India.

The complication is that reservoir volume changes are autocorrelative and networks: reservoirs cannot be filled past a certain volume; and may be filled or emptied depending on the volume available in upstream or downstream basins. The machine learning model needs to be able to capture this context.

context: ten basin networks and hydrological adjacency Wave2Web Hack

The primary drivers of water flow-runoff models are meteorological data, primarily precipitation and evapotranspiration. In this problem, we have collected precipitation and temperature (as a proxy for evapotranspiration) data for the unique upstream area of every reservoir. We also have hydrological adjacency between reservoirs based on the upstream-downstream relationships. Can we predict reservoir volume changes up to 90 days in the future to help planners avoid water shortage events?

Experiment Control and Entrypoint

Training is managed using the Sacred experiment framework. The default configuration is located at conf.yaml. Changes made to the configuration will be logger in the sacred experiment directory: experiments/sacred/{_run._id}.

The main entrypoint for model training is at run.py, which can just be invoked from the commandline:

python run.py

Data

This repo contains everything needed to serve the h2ox models. To retrain the models, a DatasetFactory is provided to rebuild the data archives for each basin network. The final configuration .yaml files are also provided. The data archives used to train the basin networks are available at gs://oxeo-public/wave2web/h2ox-ai. Full experiment logs and artefacts can be provided on request.

Dataset Factory

An abstract DatasetFactory class is used for caching and version controlling datasets used to train various basin networks. The DatasetFactory class builds a NetCDF4 .nc data archive that can be used as the input for the FcastDataset, an subclass of a PyTorch Dataset. This .nc archive is accompanied by a data.yaml file which is used to certify version control of the data archive. In this way, the expensive dataset construction steps are able to be cached, and soft data rules and hyperparameters are left to the Dataset construction at runtime. The DatasetFactory uses the data_parameters field of the experiment conf.yaml to contruct the data archive and verify version control.

Custom DataUnits are passed to the DatasetFactory. These can be found in DataUnits and allow data archives of varying parameters to be built and cached. DataUnits contain the logic for accessing, for example, zarr archives build in h2ox-data,h2ox-chirps, and h2ox-forecast, as well as BigQuery tables built in h2ox-w2w. Data archives include ECMWF-ERA5Land historical data, ECMWF-TIGGE ex-ante forecast data, CHIRPS precipitation data, WRIS Reservoir Volumes, and day-of-the-year trigonometric data.

data: sample dataframe (Indira Sagar) Wave2Web Hack

Pretrained Models

As the models are relatively small, they are shipped with this repository. They can be found in /models, both the .pt files, the .mar archives, along with accompanying config files, dummy items, reservoir lists, edge lists, and preprocessing parameters (means, standard deviations etc.) for de-normalising reservoir volume changes.

Model

H2Ox developed a Sequence-to-sequence-to-sequence LSTM model: a three-stage LSTM which 1) encodes historic meteorological forcing data into a latent hydrological state for each reservoir; 2) decodes over a forecast period, while continuing conditioning from forecast data; and 3) decodes further over a future period, with only trigonometic day-of-the-year features as input.

model: bayesian seq2seq2seq LSTM with graph conv header Wave2Web Hack

Several configurations are available for training this model. Model configuration is controlled by the model_parameters field of the conf.yaml file.

Baseline

A baseline configuration has the following model_parameters:

model_str: s2s2s
graph_conv: false
bayesian_lstm: false

Two variations are available: one-hot-encoding (ohe) or multi-site encoding (multi). One-hot-encoding concatenates input data to be indexed by <date>-<reservoir> and adding an input feature to flag which reservoir is the target. The intention of this variation is to make the training dataframe larger and learn generaliseable information about the hydrological system. A multi-target configuration is designed to allow the model to learn parameterisation of how each reservoir in the dataset impacts each other reservoir (without injecting heuristical knowledge of how this might be so).

The variation is chosen by setting dataset_parameters.ohe_or_multi to the respective either ohe or multi.

Bayesian

A bayesian configuration has been added which makes encoder and decoder weights probabilistic, sampled from a normal distribution. This configuration uses the blitz bayesian layer implementations. To enable the bayesian configuration, change the model_parameters to:

model_str: bayesian
graph_conv: false
bayesian_lstm: true

The bayesian configuration makes use of the following hyperparameters which control starting assumptions of the distribution of model weight sampling:

lstm_params:
    prior_pi: 1.
    prior_sigma_1: 0.1
    prior_sigma_2: 0.002
    posterior_mu_init: 0
    posterior_rho_init: -3

dataset_parameters.ohe_or_multi should be set to sitewise.

GNN

The final configuration adds a graph neural network (GNN) header on top of the bayesian LSTM encoder and decoders. The graph convolutional layer traverses information between the respective reservoirs in a basin using the hydrological adjacency graph. To enable this configuration, set model_parameters to:

model_str: gnn
graph_conv: true
bayesian_lstm: true

The parameters diag and digraph control properties of the adjacency matrix. diag add ones along the diagonal of the adjacency graph. digraph adds ones to the opposite quadrant of the adjacency matrix.

Sample Data

A sample cached dataset test_data.nc and config record test_data.yaml are available at gs://oxeo-public/wave2web/h2ox-ai. Final cached datasets for training each basin network are also available. The whole data folder can be copied to a local data/ folder using the gsutil command line utility:

gsutil -m cp gs://oxeo-public/wave2web/h2ox-ai/data/* ./data

This data can be copied to data/ to begin experimenting with the repository without regenerating the full dataset. Remember to change the cache path in both conf.yaml and data/test_data.yaml! basins.geojson and all_edges.json must also be copied to the data folder.

Useage

Training

The main entrypoint for model training is at run.py, which can just be invoked from the commandline:

python run.py

Sacred also allows configuration overrides via the commandline. You can use DotDict notation for nested parameters. Separate with spaces for multiple options. For example:

python run.py with "training_parameters.n_epochs=50" "model_parameters.dropout=0.5"

You can also copy the entire config and make a new version to pass to the experiment object:

python run.py with my_new_config.yaml

Training writes to a tensorboard SummaryWriter. A tensorboard server can be initiated to monitor your experiments:

tensorboard --logdir experiments/tensorboard

On your local machine, you can SSH tunnel into the tensorboard server and view your experiment progress in your browser. By default, tensorboard serves at \$PORT=6006. For example, on GCP:

gcloud compute ssh --zone "<your-instance-zone>" "<your-instance-name>"  --project "<your-instance-project>" -- -L 6006:localhost:6006

And then view your running tensorboard by directing your browser to localhost:6006.

Serving

This repo is also set up for production as a dockerised TorchServe instance. serve.py provides a custom TorchServe handler to receive, preprocess, predict, and postprocess requests.

To serve any new or retrained models,

torch-model-archiver --model-name kaveri --version 0.1 --serialized-file models/kaveri.pt --extra-files models/kaveri_preprocessing.json,models/all_edges.json,models/kaveri_sites.json,models/kaveri_cfg.json,models/kaveri_dummy_item.pkl --handler /home/lucas/h2ox-ai/serve.py

To serve a specific model:

torchserve --model-store models/ --start --ncs --models kaveri=kaveri.mar

To serve all modes:

torchserve --model-store models/ --start --ncs --models all

A sample can then be requested:

import requests, json

headers = {"Content-Type": "application/json"}
sample = {
    'instances: [json.load(open('./data/kaveri_sample_2020_10_01.json','r'))]
}
url = 'http://0.0.0.0:7080/predictions/kaveri'

r = requests.post(url,headers=headers, data=json.dumps(sample).encode('utf-8'))

print ('status code:', r.status_code)

Citation

Our Wave2Web submission can be cited as:

Kruitwagen, L., Arderne, C., Lees, T., Thalheimer, L, Kuzma, S., & Basak, S. (2022): Wave2Web: Near-real-time reservoir availability prediction for water security in India. Preprint submitted to EarthArXiv, doi: 10.31223/X5V06F. Available at https://eartharxiv.org/repository/view/3381/