This code is used to produce data associated with the manuscript:
"Harmonized, gap-filled dataset from 20 urban flux tower sites"
The code takes data provided by observing groups in various formats and creates a standardised, harmonised, gap filled dataset for each of the 20 sites at 30 and 60 minute resolutions (depending on data provided).
This is the code base. Input data (i.e. from observing groups, ERA5, WFDE5 and other global datasets) is held seperately.
Timeseries plots are available at https://urban-plumber.github.io/sites (archive).
Code for associated manuscript figures are available at https://doi.org/10.5281/zenodo.5508694
Processed output data are available at: https://doi.org/10.5281/zenodo.5517550
All times are in UTC with period ending timestamps. To convert a netcdf to local time use convert_UTC_to_local_time.py
. To convert a netcdf to a text file use convert_nc_to_text.py
.
run_all.py
: wrapper to run pipeline at all sitespipeline_functions.py
: main pipeline and shared functionsqc_observations
: quality control processingplot_sitemaps.py
: for plotting regional and local map imagesconvert_nc_to_text.py
: for converting previously created netcdf output to text format (for convenience)convert_UTC_to_local_time.py
: takes a netcdf file (all in UTC) and converts to local time using the UTC offset parameter included in netcdf metadata- in
sites/SITENAME/
:create_dataset_SITENAME
: site specific wrapper for setting site information, importing site observations and calling pipeline functions to process and create output.SITENAME_sitedata_vX.csv
: site specific numerical metadata, e.g. latitude, longitude, surface cover fractions, morphology, population density etc. Includes sources.
up_env.yaml
: the conda environment in which processing took placearchive.py
: script for packaging archives
Code is written in Python 3.8 with dependencies including numpy, pandas, xarray, statsmodels, matplotlib, ephem and cartopy. See the file up_env.yaml
for the full conda environment used.
The folder input_data is required for processing (not included in this repository).
- use
/run_all.py
- edit variables
projpath
to point to this repository, anddatapath
to point to input data - edit the sitelist as required
- type
python run_all.py
- use
/sites/SITENAME/create_dataset_SITENAME.py
- edit variables
projpath
to point to this repository, anddatapath
to point to input data (or use arguments at the command line, see below) - update the output version with
out_suffix
- type
python create_dataset_SITENAME.py
Optional arguments for running create_dataset_SITENAME.py
are:
-- log
: logs print statements in processing to file-- global
: provides site characteristics from global datasets (large files)-- existing
: loads previously processed nc files (if running after processing)-- projpath XXX
: changes the root project folder-- datapath XXX
: changes the folder containing raw site data
When run the following outputs are produced for each of the 20 sites within the site/SITENAME
folder:
-
index.md
: markdown file for displaying key site metadata and output plots in a single webpage -
timeseries/
: following timeseries are produced as netCDF4 (nc) and as plain text (txt)SITENAME_raw_observations_vX
: site observed timeseries before project-wide quality control.SITENAME_clean_observations_vX
: site observed timeseries after project-wide quality control. Includes site characteristic metadata.SITENAME_metforcing_vX
: site observed timeseries after project-wide quality control and gap filling.SITENAME_era5_corrected_vX
: site ERA5 surface data (1990-2020) with bias corrections as applied in the final dataset.SITENAME_era5_linear_vX
: site ERA5 surface data (1990-2020) with bias corrections using a linear regression on observations.SITENAME_ghcnd_precip.csv
: continuous daily precipitation timeseries using the nearest available GHCND station observations.
-
obs_plots/
:all_obs_qc.png
: a summary plot of all included observations as provided (i.e. with nearby tower gap filling but before project gap filling, with project quality controlled periods flagged)VARIABLE_gapfilled_forcing.png
: timeseries after gap filling and prepending with corrected ERA5 dataVARIABLE_obs_qc_diurnal.png
: diurnal plot of all observations, with periods flagged by qc marked in red
-
era_correction/
:- mean hourly diurnal and timeseries images for each corrected variable comparing bias correction methods. 'all' includes all observed data (after qc) for for training and testing. 'out-of-sample' uses an 80/20 approach to train/test data.
SITENAME_erabias_VARIABLE
: the corrections applied to ERA5 data
-
processing/
:- timeseries (csv) for qc flagged observations, with seperate timeseries for each qc step ('range','constant','night','sigma') and all 'dirty'.
- error metric results when comparing ERA5 and bias corrected timeseries with available observations at 60 minute resolution. 'all' includes all observed data (after qc) for for training and testing. 'out-of-sample' uses an 80/20 approach to train/test data.
-
precip_plots/
:- a series of plots comparing cumulative precipitation timeseries. The 'snow_correction' plot shows precipitation after including snowfall from ERA5 and after partitioning to maintain mass balance (see
partition_precip_to_snowf_rainf
subroutine inpipeline_functions.py
).
- a series of plots comparing cumulative precipitation timeseries. The 'snow_correction' plot shows precipitation after including snowfall from ERA5 and after partitioning to maintain mass balance (see
-
images/
:- various maps at regional and local scales.
SITENAME_site_photo.jpg
provided seperately by observing groups.
- various maps at regional and local scales.
-
log_processing_AU-Preston_vX.txt
: a log of the print statements through running the create_dataset_SITENAME scripts.
This code base was developed by Mathew Lipson: https://orcid.org/0000-0001-5322-1796
Co-authors for the associated manuscript are: Sue Grimmond, Martin Best, Andreas Christen, Andrew Coutts, Ben Crawford, Bert Heusinkveld, Erik Velasco, Helen Claire Ward, Hirofumi Sugawara, Je-Woo Hong, Jinkyu Hong, Jonathan Evans, Joseph McFadden, Keunmin Lee, Krzysztof Fortuniak, Leena Järvi, Matthias Roth, Nektarios Chrysoulakis, Nigel Tapper, Oliver Michels, Simone Kotthaus, Stevan Earl, Sungsoo Jo, Valéry Masson, Winston Chow, Wlodzimierz Pawlak, Yeon-Hee Kim.
Mathew Lipson acknowledges the input from all co-authors, plus the guidance and suggestions from Andy Pitman, Gab Abramowitz, Anna Ukkola, Arden Burrell and Paola Petrelli.
Mathew is supported by supported by UNSW Sydney, the Australian Research Council (ARC) Centre of Excellence for Climate System Science (grant CE110001028) and ARC Centre of Excellence for Climate Extremes (grant CE170100023).