/cnn_lstm_era

Code used in the study "Evaluation and interpretation of convolutional long-short term memory networks for regional hydrological modelling"

Primary LanguageJupyter NotebookMIT LicenseMIT

CNN-LSTMs for regional hydrological modelling

This repository contains the code used in the study:

Anderson, Sam and Valentina Radic. "Evaluation and interpretation of convolutional-recurrent neural networks for regional hydrological modelling" (Submitted 2021).

The code in this repository can reproduce all figures and findings in the study. All data used is publicly accessable and details to download data are given below. This repository contains the following files:

  • main_publish.ipynb: Defines functions, loads preprocessed data, builds/trains CNN-LSTM model, evaluates performance, interprets model learning, creates figures
  • preprocessing.ipynb: Loads raw data (temperature, precipitation, streamflow, and basin outlines) and preprocesses into format used in main.ipynb
  • figure_study_region.ipynb: Creates Figure 1 (study region).
  • era5_download_P_075grid.py: Connects to ERA5 API and downloads raw precipitation data
  • era5_download_T2m_075grid.py: Connects to ERA5 API and downloads raw temperature data
  • non_contributing_areas.ipynb: Calculates non-contributing areas of basins in the eastern cluster
  • mini.ipynb: A miniature version of main_publish.ipynb, which loads and structures one year of input/target data, clusters stream gauge stations, makes heat maps, and perturbs input temperature

How to run code

Practically, main_publish.ipynb runs best on a GPU to train the models much faster. It is set up to run in Google Colab. Google Colab does not access locally saved files; rather, it can access those in Github and Google Drive. So, main_publish.ipynb can be run on Colab via Github, and all outputs/required data can be saved/organized in Google Drive as outlined in the notebook. The other files (preprocessing.ipynb, figure_study_region.ipynb, era5_download_P_075grid.py, era5_download_T2m_075grid.py, non_contributing_areas.ipynb) can be run locally. Here we give instructions to replicate the results in Google Colab.

  1. Download ERA5 data:

    • Locally, run era5_download_P_075grid.py and era5_download_T2M_075grid.py; save output files (ERA5_T_1979_2015_6hourly_075_grid_AB_BC.nc and ERA5_P_1979_2015_6hourly_075_grid_AB_BC.nc) locally in cnn_lstm_era/Data/ERA5/
  2. Download streamflow data:

    • See here for instructions to download available data for all active and naturalized stream gauge stations in Alberta and BC. ABActNatFlowAll.csv and BCActNatFlowAll.csv list the stations which should be downloaded. Save streamflow data in ./Data/Flow/
  3. Download basin outline data:

    • From here, download the folder WSC_Basins.gdb. A direct download link and other information can be found here Save this folder as ./Data/WSC_Basins.gdb/
  4. Download provincial border shapefiles:

    • From Statistic Canada, download the "Provinces/Territories Cartographic Boundary File - 2016 Census" shapefile (SHP). Note: This data is not necessary for the analysis, but it used for making maps.
  5. Download glacier data:

    • Download the file 02_rgi60_WesternCanadaUS.shp by clicking 'Western Canada and USA' from the Randolph Glacier Inventory V6.0. Save in Google Drive at './data/RGI/'.
  6. Preprocess the raw ERA5, streamflow, and basin outline data using preprocessing.ipynb

  7. Upload preprocessed files from Step 5 to Google Drive in folder './data/'. Upload shapefiles from Step 4 to Google Drive in folder ./data/province_borders/ (e.g. ./data/province_borders/lpr_000b16a_e.shp)

  8. Upload trained models (from './Models/') to Google Drive in folder './models/'.

  9. Run main_publish.ipynb in Colab.

If interested in non-contributing areas in the eastern cluster:

  1. Download non-contributing area data:

    • From here, download the folder "HYD_AAFC_TOTAL_NON_CTRB_DRAIN.gdb" by clicking 'Pre-packaged FGDB files (Bilingual)' --> 'Access'. Save this folder as './Data/HYD_AAFC_TOTAL_NON_CTRB_DRAIN.gdb/'.
  2. Run non_contributing_areas.ipynb

If interested in the Reference Hydrometric Basin Network (RHBN) and how stations in the RHBN overlap with those in this study:

  1. Download 'RHBN_Metadata.xlsx' from Environment and Climate Change Canada. Save in Google Drive in './data/'. This file is used in main_publish.ipynb.

Miniature code

To reproduce some of the key results without downloading and structuring the whole datasets in Steps 1-12 above, you can use mini.ipynb. This notebook loads enough preprocessed data to structure 1 year of climate reanalysis and streamflow data, load trained models, make sensitivity heat maps, and perturb input temperature data. This notebook uses data saved in './Data/mini/' which can be uploaded to Google Drive (for access in Colab) in the folder './data_mini/'. While mini.ipynb can be run locally, predicting streamflow under temperature perturbations (to identify freshet response) or spatial perturbations (to make heat maps) is much faster when predictions can be made in batches on a GPU (e.g. on Colab).


File organization

Local organization:

  • cnn_lstm_era/
    • main_publish.ipynb
    • preprocessing.ipynb
    • figure_study_region.ipynb
    • era5_download_P_075grid.py
    • era5_download_T2M_075grid.py
    • Models/
      • All trained bulk and fine-tuned models (.h5)
    • Data/
      • ERA5/
        • ERA5_T_1979_2015_6hourly_075_grid_AB_BC.nc
        • ERA5_P_1979_2015_6hourly_075_grid_AB_BC.nc
      • Flow/
        • AB/
          • ABActNatFlowAll.csv
          • 05AA004_Daily_Flow_ts.csv
          • ...
          • 11AA026_Daily_Flow_ts.csv
        • BC/
          • BCActNatFlowAll.csv
          • 07EA004_Daily_Flow_ts.csv
          • ...
          • 10DA001_Daily_flow_ts.csv
      • WSC_Basins.gdb/
        • ...
      • Mini/
        • x_intermediate_mini.pickle
        • y_mini.pickle
        • flowseason_norm.pickle
        • station_info.pickle
        • stationBasins.pickle

Google Drive organization (for Colab access)

  • My Drive/
    • Colab Notebooks/
      • cnn_lstm_era/
        • models/
        • output/
        • heat_maps/
        • data/
          • province_borders/
          • RGI/
        • heat_maps_mini/
        • data_mini/