The package ingestr
provides functions to extract (ingest) environmental point data (given longitude, latitude, and required dates) from large global files or remote data servers and create time series at user-specified temporal resolution (currently, just daily implemented). The main functionalities are:
- Temporal downscaling from montly to daily resolution
- Quality filtering, temporal interpolation and smoothing of remote sensing data
- Handling of different APIs and file formats, returning ingested data in tidy format.
This is to make your life simpler when downloading and reading site-scale data, using a common interface with a single function for single-site and multi-site ingest, respectively, and a common and tidy format of ingested data across a variety of data sources and formats of original files. Sources, refers to both data sets hosted remotely and accessed through an API and local data sets. ingestr is particularly suited for preparing model forcing and offers a set of functionalities to transform original data into common standardized formats and units. This includes interpolation methods for converting monthly climate data (CRU TS currently) to daily time steps.
The key functions are ingest_bysite()
and ingest()
for a single-site data ingest and a multi-site data ingest, respectively. For the multi-site data ingest, site meta information is provided through the argument siteinfo
which takes a data frame with columns lon
for longitude, lat
for latitude, and (for time series downloads) year_start
and year_end
, specifying required dates (including all days of respective years). Sites are organised along rows. An example site meta info data frame is provided as part of this package for sites included in the FLUXNET2015 Tier 1 data set (siteinfo_fluxnet2015
, additional columns are not required by ingest_bysite()
and ingest()
).
The following sources can be handled currently:
Data source | Data type | Coverage | Source ID | Reading from | Remark |
---|---|---|---|---|---|
FLUXNET | time series by site | site | fluxnet |
local files | Extraction by site name |
WATCH-WFDEI | time series raster map | global | watch_wfdei |
local files | |
WFDE5 | time series raster map | global | wfde5 |
local files | Cucchi et al. (2020) |
CRU | time series raster map | global | cru |
local files | |
MODIS LP DAAC | time series raster map | global | modis |
remote server | using MODISTools |
Google Earth Engine | time series raster map | global | gee |
remote server | using Koen Hufken's gee_suset library |
ETOPO1 | raster map | global | etopo1 |
local files | |
Mauna Loa CO2 | time series | site | co2_mlo |
remote server | using the climate R package |
HWSD | raster map, database | global | hwsd |
local files | using an adaption of David Le Bauer's rhwsd R package |
WWF Ecoregions | shapefile map | global | wwf |
local files | Olsen et al. (2001) |
N deposition | time series raster map | global | ndep |
local files | Lamarque et al. (2011) |
SoilGrids | raster map | global | soilgrids |
remote server | Hengl et al. (2017) |
ISRIC WISE30sec | raster map | global | wise |
local files | Batjes (2016) |
GSDE Soil | raster map | global | gsde |
local files | Shangguan et al. 2014 |
WorldClim | raster map | global | gsde |
local files | Fick & Hijmans, 2017 |
Examples to read data for a single site for each data type are given in Section 'Examples for a single site'. Handling ingestion for multiple sites is described in Section 'Example for a set of sites'. Unless remarked otherwise, extraction goes by longitude/latitude values. Note that this package does not provide the original data. Please follow links to data sources above where data is read from local files, and always cite original references.
All ingested data follows standardized variable naming and SI units. For example:
Variable | Variable name | Units |
---|---|---|
Gross primary production | gpp |
g CO$^{-2}$ m$^{-2}$ |
Air temperature | temp |
$^\circ$C |
Daily minimum air temperature | tmin |
$^\circ$C |
Daily maximum air temperature | tmax |
$^\circ$C |
Precipitation | prec |
mm s$^{-1}$ |
Vapour pressure deficit | vpd |
Pa |
Atmospheric pressure | patm |
Pa |
Net radiation | netrad |
J m$^{-2}$ s$^{-1}=$ W m$^{-2}$ |
Photosynthetic photon flux density | ppfd |
mol m$^{-2}$ s$^{-1}$ |
Elevation (altitude) | elv |
m a.s.l. |
Use these variable names for specifying which variable names they correspond to in the original data source (see argument getvars
to functions ingest()
and ingest_bysite()
). gpp
is cumulative, corresponding to the time scale of the data. For example, if daily data is read, gpp
is the total gross primary production per day (g CO$^{-2}$ m$^{-2}$ d$^{-1}$).
To install and load the rsofun package using the latest release run the following command in your R terminal:
if(!require(devtools)){install.packages(devtools)}
devtools::install_github("stineb/ingestr")
library(ingestr)
The ingestr
package relies heavily on the tidyverse. Dependencies are dplyr, purrr, lubridate, tidyr, raster, lubridate, stringi, stringr, sp, ncdf4, signal, climate. To install all required packages, do:
list_pkgs <- c("dplyr", "purrr", "lubridate", "tidyr", "raster", "lubridate", "stringi", "stringr", "sp", "ncdf4", "signal", "climate", "rgdal")
new_pkgs <- list_pkgs[!(list_pkgs %in% installed.packages()[,"Package"])]
if(length(new_pkgs)) install.packages(new_pkgs)
Are described in vignette example
, available here.
This package is designed to be extendible to ingesting other data types (sources). The developer (Beni Stocker) would appreciate if you made sure that your developments can be fed back to this repository. To do so, please use git. See here for a brief introduction to git.
I recommend the following steps if you would just like to use this package (no development):
- Directly install the package from the most up-to-date code on GitHub by
devtools::install_github("stineb/ingestr")
I recommend the following steps if you would like to use and further develop the package (even just for your own application - But keep in mind: others may benefit from your efforts too!):
- Make sure you have a Github account.
- Log on to Github, and go to https://github.com/stineb/ingestr and click on 'Fork' in the upper right corner. This makes a copy of the repository that belongs to you, meaning that you can modify, commit, and push changes back to your forked repository as you please.
- Clone your fork to your local computer by entering in your terminal (here, it's cloned to a subdirectory
ingestr
placed in your home directory):
cd home
git clone https://github.com/<your_github_username>/ingestr.git
- In RStudio, create a new project in your local directory
~/ingestr/
. This opens the repository in RStudio and you have access to the code where all ingestr-functions are implemented (see subdirectory./R/
). - In RStudio, after having edited code, select the 'Build' tab and click on 'Install and Restart' to build the package again. For quick edits and checks, you may simply source the edited files instead of re-building the whole package. If you like to add new functions, create new a source file in subdirectory
./R/
, write a nice roxygen header (see other source files as an example), then click on 'Build' -> 'More' -> 'Document', and then again on 'Install and Restart'. - If you're happy with your new edits and additions to the package, you may want to have it fet back to the original repository. To do so, please create a new pull request in GitHub: Click on 'New pull request' on the repository page and follow the inuitive steps. Thanks!
This package is still in its maturing phase. To stay up-to-date with the latest version, regularly re-install from GitHub (devtools::install_github("stineb/ingestr")
), or - if you're building from a locally (git) cloned repository - regularly do a git pull
and re-install the package.