allodb: An R package for biomass estimation at extratropical forest plots
gonzalezeb opened this issue · 54 comments
Date accepted: 2021-09-27
Submitting Author: Erika Gonzalez-Akre (@gonzalezeb)
Other Authors: Camille Piponiot (@cpiponiot), Mauro Lepore (@maurolepore), Kristina Teixeira (@teixeirak)
Repository: https://github.com/forestgeo/allodb
Version submitted: 0.0.0.9000
Editor: @adamhsparks
Reviewers: @jeffreyhanson, @jstillh
Due date for @jstillh: 2021-06-24
Archive: TBD
Version accepted: TBD
- Paste the full DESCRIPTION file inside a code block below:
Package: allodb
Title: Tree Biomass Estimation at Extratropical Forest Plots
Version: 0.0.0.9000
Authors@R:
c(person(given = "Erika",
family = "Gonzalez-Akre",
role = c("aut", "cre", "cph"),
email = "GonzalezEB@si.edu"),
person(given = "Camille",
family = "Piponiot",
role = "aut",
email = "camille.piponiot@gmail.com"),
person(given = "Mauro",
family = "Lepore",
role = "aut",
email = "maurolepore@gmail.com",
comment = c(ORCID = "0000-0002-1986-7988")),
person(given = "Kristina",
family = "Anderson-Teixeira",
role = "aut",
email = "TeixeiraK@si.edu"))
Description: Tool to standardize and simplify the tree biomass
estimation process across globally distributed extratropical forests.
License: GPL-3
URL: https://github.com/forestgeo/allodb
BugReports: https://github.com/forestgeo/allodb/issues
Depends:
R (>= 3.5.0)
Imports:
data.table (>= 1.12.8),
ggplot2 (>= 3.3.2),
kgc (>= 1.0.0.2),
utils
Suggests:
covr (>= 3.2.1),
knitr (>= 1.21),
purrr,
rmarkdown,
spelling,
testthat (>= 3.0.0)
VignetteBuilder:
knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Scope
-
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
allodb
integrates compiled data and built-in functions to calculate tree above-ground biomass (AGB) which is a critical value used by forest ecologists to estimate the amount of carbon sequester and resealed by forests.allodb
falls in the "field and laboratory reproducibility tools" category as our package intents to improve and standardize the calculation of a widely used forest metric. -
Who is the target audience and what are scientific applications of this package?
allodb
is designed to be used by forest ecologists with basic or advanced knowledge of R. This scientific application is intended to be used by individuals interested in calculating forest metrics such as AGB in extratropical forest (subtropical, temperate, and boreal forests). -
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
The BIOMASS package is designed to estimate biomass in tropical forest. The main difference betweenallodb
and BIOMASS is the set of equations used to estimate AGB. There is an existing calibrated equation used globally for tropical environments but not for areas outside the tropics. Our goal is to provide a set of curated equations and the ability to add new and additional equations in order to have a more reliable value of AGB outside the tropics. We do not duplicate any functionality of the BIOMASS package and only agree on the ecological scope. -
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
N/A -
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
@melvidoni or issue 427
Technical checks
Confirm each of the following by checking the box.
- I have read the guide for authors and rOpenSci packaging guide.
This package:
- does not violate the Terms of Service of any service it interacts with.
- has a CRAN and OSI accepted license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions, created with roxygen2.
- contains a vignette with examples of its essential functions and uses.
- has a test suite.
- has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.
Publication options
-
Do you intend for this package to go on CRAN?
-
Do you intend for this package to go on Bioconductor?
-
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
@ropensci-review-bot assign @adamhsparks as editor
Assigned! @adamhsparks is now the editor
Editor checks:
- Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
- Fit: The package meets criteria for fit and overlap
- Automated tests: Package has a testing suite and is tested via a CI service.
- License: The package has a CRAN or OSI accepted license
- Repository: The repository link resolves correctly
Editor comments
G'day, Erika,
I'll be your guest editor for allodb.
The package passes all standard local checks and tests with flying colours. But checking the code syntax using lintr::lint_package()
(I for some reason can't use goodpractice::gp()
locally so I'm falling back to this), I get several suggestions for improving the code. I suggest going through them and addressing where appropriate and possible and perhaps noting in your submission those flags that are not appropriate to address for allodb. This will make the review process easier for the reviewers.
Also, I note that your package's website is incomplete, but once accepted to rOpenSci a website will be generated for you and I note that your current README contains the necessary information on how to install and use the package, so we're good there.
Once you've addressed the issues flagged with goodpractice::gp()
or lintr::lint_package()
, I'll move on to finding reviewers for you.
@ropensci-review-bot run goodpractice
Hi, @gonzalezeb, just checking in to see if there’s anything I can do to help you with your submission process?
Hi Adam, thanks for checking in.
I am getting some errors after addressing some issues brought out by lintr::lint_package()
and goodpractice::gp()
, but my collaborator is not, so I am not sure if everything is good (at least not from my laptop).
When I run devtools::test()
to test functionalities, I get that some tests are not running well. When I run goodpractice::goodpractice() on 3/26 I got "Supreme package..." but now I get a few errors. Maybe if you run some checks again you can let me know if you are still seeing issues..
Yes, it passes both the CRAN checks when I run them with, and devtools::test()
, but the code style, etc. as suggested by the rOpenSci guidelines have some suggestions.
This is what I see when I run lint_package()
.
Some of these can easily be addressed, others like the first hit with names, probably have good reasons in your field to have the object name like that. That's fine, but it should be noted for the reviewers.
> lintr::lint_package()
................
R/est_params.R:46:24: style: Variable and function name style should be snake_case.
Nres = 1e4
^~~~
R/est_params.R:64:51: style: Commas should always have a space after.
coords = dfobs[i, c("long","lat")],
^
R/est_params.R:72:66: style: Put spaces around all infix operators.
if (length(unique(df$equation_id)) == 1) df[1, 3] <- df[1, 3]*1.01
~^~
R/get_biomass.R:56:25: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/illustrate_allodb.R:48:31: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/imports.R:6:1: style: Variable and function name style should be snake_case.
data.frame <- function(...) {
^~~~~~~~~~
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:71:27: style: Variable and function name style should be snake_case.
new_minDBH = NULL,
^~~~~~~~~~
R/new_equations.R:72:27: style: Variable and function name style should be snake_case.
new_maxDBH = NULL,
^~~~~~~~~~
R/new_equations.R:73:27: style: Variable and function name style should be snake_case.
new_sampleSize = NULL,
^~~~~~~~~~~~~~
R/new_equations.R:74:27: style: Variable and function name style should be snake_case.
new_unitDBH = "cm",
^~~~~~~~~~~
R/new_equations.R:75:27: style: Variable and function name style should be snake_case.
new_unitOutput = "kg",
^~~~~~~~~~~~~~
R/new_equations.R:76:27: style: Variable and function name style should be snake_case.
new_inputVar = "DBH",
^~~~~~~~~~~~
R/new_equations.R:77:27: style: Variable and function name style should be snake_case.
new_outputVar = "Total aboveground biomass",
^~~~~~~~~~~~~
R/new_equations.R:88:34: style: Variable and function name style should be snake_case.
suppressWarnings(new_equations$dbh_unit_CF <-
^~~~~~~~~~~
R/new_equations.R:90:34: style: Variable and function name style should be snake_case.
suppressWarnings(new_equations$output_units_CF <-
^~~~~~~~~~~~~~~
R/new_equations.R:112:5: style: Variable and function name style should be snake_case.
toMerge <- eq_jansen[, c("hsub", "equation_allometry")]
^~~~~~~
R/new_equations.R:113:64: style: Variable and function name style should be snake_case.
eq_jansen$equation_allometry <- apply(toMerge, 1, function(X) {
^
R/new_equations.R:181:30: style: There should be a space between right parenthesis and an opening curly brace.
if (is.matrix(new_coords)){
^~
R/new_equations.R:265:5: style: Variable and function name style should be snake_case.
equationID <- paste0("new", seq_len(length(new_taxa)))
^~~~~~~~~~
R/new_equations.R:266:5: style: Variable and function name style should be snake_case.
coordsEq <- cbind(long = new_coords[, 1],
^~~~~~~~
R/new_equations.R:268:5: style: Variable and function name style should be snake_case.
rcoordsEq <- round(coordsEq * 2 - 0.5) / 2 + 0.25
^~~~~~~~~
R/new_equations.R:270:5: style: Variable and function name style should be snake_case.
koppenZones <- apply(rcoordsEq, 1, function(X) {
^~~~~~~~~~~
R/new_equations.R:270:49: style: Variable and function name style should be snake_case.
koppenZones <- apply(rcoordsEq, 1, function(X) {
^
R/new_equations.R:273:5: style: Variable and function name style should be snake_case.
koppenZones <- as.character(unlist(koppenZones))
^~~~~~~~~~~
R/new_equations.R:301:5: style: Variable and function name style should be snake_case.
dbhCF <-
^~~~~
R/new_equations.R:303:5: style: Variable and function name style should be snake_case.
outputCF <-
^~~~~~~~
R/new_equations.R:305:28: style: Variable and function name style should be snake_case.
suppressWarnings(dbhCF$dbh_unit_CF <-
^~~~~~~~~~~
R/new_equations.R:307:31: style: Variable and function name style should be snake_case.
suppressWarnings(outputCF$output_units_CF <-
^~~~~~~~~~~~~~~
R/resample_agb.R:41:26: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh
^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh[[j]]
^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
suppressWarnings(dfequation$wN <-
^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
coordsSite <- t(as.numeric(coords))
^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
dfequation$wT <- 1e-6
^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
vecW <- dfequation$w
^~~~
tests/testthat/test-est_params.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-est_params.R:10:39: style: Commas should always have a space after.
long = c(-78,-85),
^
tests/testthat/test-get_biomass.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:25:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:92:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:106:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:122:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-illustrate_allodb.R:10:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-illustrate_allodb.R:31:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-new_equations.R:4:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-new_equations.R:127:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-resample_agb.R:19:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-weight_allom.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
There is a note about UTF-8 strings as well that I'm seeing that you'll need to address as well.
* checking data for non-ASCII characters ... NOTE
Note: found 156 marked UTF-8 strings
Thanks @adamhsparks. Most of the suggestion after running lint_package()
have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.
OK, this is what I get now.
> lintr::lint_package()
................
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:115:65: style: Variable and function name style should be snake_case.
eq_jansen$equation_allometry <- apply(to_merge, 1, function(X) {
^
R/new_equations.R:273:51: style: Variable and function name style should be snake_case.
koppen_zones <- apply(rcoords_eq, 1, function(X) {
^
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh
^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh[[j]]
^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
suppressWarnings(dfequation$wN <-
^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
coordsSite <- t(as.numeric(coords))
^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
dfequation$wT <- 1e-6
^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
vecW <- dfequation$w
^~~~
tests/testthat/test-est_params.R:9:39: style: Commas should always have a space after.
long = c(-78,-85),
Are there any that I can provide assistance with?
Hi @adamhsparks. Most issues brought out by lint
have been addressed, only two remain but I am little stuck on them. Maybe you can give me some guidance.
In R/new_equations
, the function contains a lot of 'if' because it does a lot of sanity checks on the user-provided data. We could remove these, at the risk of having undetected problems in the user-provided allometries; and it seems that moving the 'ifs' to another function would make much sense.
In R/weight_allom
, the problem is with "wE" which is the original col name of a matrix (koppenMatrix) been used.
Thanks, @gonzalezeb, for addressing these changes. I think that both of these are acceptable given the reasons stated. I'll begin looking for reviewers now.
@gonzalezeb, can you please update your README badges for allodb using rodev::use_review_badge(436)
and add a NEWS.md file, usethis::use_news_md()
?
@gonzalezeb, can you also please address the errors that are encountered when running checks?
==> devtools::check()
ℹ Updating allodb documentation
ℹ Loading allodb
Writing NAMESPACE
Writing NAMESPACE
── Building ────────────────────────────────────────────────────────── allodb ──
Setting env vars:
● CFLAGS : -Wall -pedantic -fdiagnostics-color=always
● CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always
● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
────────────────────────────────────────────────────────────────────────────────
✓ checking for file ‘/Users/adamsparks/Development/GitHub/Reviews/allodb/DESCRIPTION’ ...
─ preparing ‘allodb’:
✓ checking DESCRIPTION meta-information ...
─ installing the package to build vignettes
E creating vignettes (5.3s)
--- re-building ‘allodb-vignette.Rmd’ using rmarkdown
Quitting from lines 81-87 (allodb-vignette.Rmd)
Error: processing vignette 'allodb-vignette.Rmd' failed with diagnostics:
arguments imply differing number of rows: 9991, 19
--- failed re-building ‘allodb-vignette.Rmd’
SUMMARY: processing the following file failed:
‘allodb-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
Error: 'col_ed' is not an exported object from 'namespace:cli'
Execution halted
Exited with status 1.
Once that's dealt with, I'll look for reviewers. 🙏
Hi @adamhsparks, allodb
is passing all checks, please let me know if you see the same (I hope!)
devtools::check(remote = TRUE, manual = TRUE)
i Updating allodb documentation
i Loading allodb
Writing NAMESPACE
Writing NAMESPACE
-- Building --------------------------------------------------- allodb --
Setting env vars:
* CFLAGS : -Wall -pedantic -fdiagnostics-color=always
* CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always
* CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
-------------------------------------------------------------------------
√ checking for file 'C:\Users\erikab\Dropbox (Smithsonian)\GitHub\allodb/DESCRIPTION' (349ms)
- preparing 'allodb': (11.3s)
√ checking DESCRIPTION meta-information ...
- installing the package to build vignettes
√ creating vignettes (39.3s)
- checking for LF line-endings in source and make files and shell scripts (11.1s)
- checking for empty or unneeded directories
- looking to see if a 'data/datalist' file should be added
- building 'allodb_0.0.0.9000.tar.gz'
-- Checking --------------------------------------------------- allodb --
Setting env vars:
* _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
* _R_CHECK_CRAN_INCOMING_REMOTE_ : TRUE
* _R_CHECK_CRAN_INCOMING_ : TRUE
* _R_CHECK_FORCE_SUGGESTS_ : FALSE
* NOT_CRAN : true
-- R CMD check ----------------------------------------------------------
- using log directory 'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck' (1s)
- using R version 3.6.3 (2020-02-29)
- using platform: x86_64-w64-mingw32 (64-bit)
- using session charset: ISO8859-1
- using option '--as-cran' (402ms)
√ checking for file 'allodb/DESCRIPTION'
- this is package 'allodb' version '0.0.0.9000'
- package encoding: UTF-8
N checking CRAN incoming feasibility (35.1s)
Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
New submission
Version contains large components (0.0.0.9000)
√ checking package namespace information ...
√ checking package dependencies (2.3s)
√ checking if this is a source package ...
√ checking if there is a namespace
√ checking for executable files (1.2s)
√ checking for hidden files and directories ...
√ checking for portable file names ...
√ checking whether package 'allodb' can be installed (11.2s)
√ checking installed package size ...
√ checking package directory (894ms)
√ checking for future file timestamps (624ms)
√ checking 'build' directory ...
√ checking DESCRIPTION meta-information (718ms)
√ checking top-level files ...
√ checking for left-over files ...
√ checking index information (353ms)
√ checking package subdirectories (392ms)
√ checking R files for non-ASCII characters ...
√ checking R files for syntax errors ...
√ checking whether the package can be loaded (355ms)
√ checking whether the package can be loaded with stated dependencies ...
√ checking whether the package can be unloaded cleanly (336ms)
√ checking whether the namespace can be loaded with stated dependencies (360ms)
√ checking whether the namespace can be unloaded cleanly (459ms)
√ checking loading without being on the library search path (780ms)
√ checking use of S3 registration (2.5s)
√ checking dependencies in R code (2s)
√ checking S3 generic/method consistency (783ms)
√ checking replacement functions (335ms)
√ checking foreign function calls (802ms)
√ checking R code for possible problems (5.4s)
√ checking Rd files (681ms)
√ checking Rd metadata ...
√ checking Rd line widths ...
√ checking Rd cross-references (343ms)
√ checking for missing documentation entries (473ms)
√ checking for code/documentation mismatches (1.5s)
√ checking Rd \usage sections (1.1s)
√ checking Rd contents ...
√ checking for unstated dependencies in examples ...
√ checking contents of 'data' directory (696ms)
√ checking data for non-ASCII characters (1s)
√ checking data for ASCII and uncompressed saves (588ms)
√ checking R/sysdata.rda (478ms)
√ checking installed files from 'inst/doc' ...
√ checking files in 'vignettes' ...
√ checking examples (4.7s)
√ checking for unstated dependencies in 'tests' ...
- checking tests (429ms)
√ Running 'testthat.R' (8.7s)
√ checking for unstated dependencies in vignettes (9.1s)
√ checking package vignettes in 'inst/doc' ...
√ checking re-building of vignette outputs (23.6s)
√ checking PDF version of manual (3.6s)
√ checking for detritus in the temp directory
See
'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck/00check.log'
for details.
-- R CMD check results --------------------------- allodb 0.0.0.9000 ----
Duration: 2m 0.2s
> checking CRAN incoming feasibility ... NOTE
Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
New submission
Version contains large components (0.0.0.9000)
0 errors √ | 0 warnings √ | 1 note x
Hi @gonzalezeb, I'm getting a NOTE when I run R CMD check
.
Note: found 156 marked UTF-8 strings
Can you check into this?
As it's just a note, not an error in the checks, I'll start looking for reviewers now.
Hi @gonzalezeb, I just had a quick look through this package it looks really awesome! Great work!
Before I do a through review, I have some broad suggestions for the code/doucmentation that are relevant for multiple functions. I also noticed a few specific things that might be good to fix before I do the reivew too. I've listed them below - what do you think? I think most of them could be addressed by polishing the code/documentation a bit more? I'm also happy to conduct a through review with the current version of the package if you prefer - it will just a lot of small issues (e.g. this term in this file should have code formatting) that might be annoying for you to have to respond to?
-
The package website isn't working (https://forestgeo.github.io/allodb/). I think there might be a bug in the GitHub Actions code for generating the package website (e.g. see errors here).
-
I feel like the function documentation could be improved by using code formatting (e.g. use "`character` vector" instead of "character vector"). This would help users understand that you are referring to a specific data type. This suggestion applies to multiple functions - what do you think?
-
I feel like the function documentation could be improved by using more precise terminology. For example, in the documentation for function parameters, I think it would be helpful to specify the exact class for arguments (e.g. using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Additionally, I think it would be helpful to include classes in the documentation for function outputs (i.e. in
@return
sections). For example, this sentence currently says "A data frame of resampled DBHs and associated ..." and could be updated to say "A data frame (`tibble::tibble()` object) of resampled DBHs and associated ...". This suggestion applies to multiple functions - how does that sound?Also, I feel like the documentation for function parameters could be more consistent. I really like how the documentation for the
dbh
parameter inget_biomass
function has the format "A [class-name] containing [...]". I wonder if you could update the parameter documentation for the other parameters -- e.g. thewna
parameter -- so that they all follow this format (e.g.wna
currently reads like "This parameter is used [...]" and doesn't contain class information). -
It's not really the best practice to incude a non-R script file in the /R folder. Perhaps the
sysdata.rda
file could be moved to a more apropriate location? -
I feel like the it might helpful to include some code to validate function arguments. This could help users understand why their code isn't working if they supply incorrect arguments (e.g. if they supply a
numeric
vector instead of acharacter
vector). For example, I think this argument validation code is fantastic and will be really useful for users, and I wonder if you could add similar code to verify arguments to all parameters in each function? If you're interested in R packages to help with this, I personally use the assertthat R package (e.g. see this function) and I have heard good things about checkmate R package too. What do you think? -
Could you remove the unused Travis badge in REAMDE (because it seems that GitHub Actions is used for CI)?
-
The .buildignore folder seems to contain random R script and meeting notes. I don't see the benefit for keeping this in the code repository? Perhaps you could move them to a wiki, issues, or a seperate repo? Sorry if I'm missing something?
-
This function (and maybe others?) over-ride the global state. This is not ideal because it could cause issues for users when they do not expect such changes. For example, imagine a user is trying to use this package within an analysis, and they manually use
set.seed()
to handle random number generation for simulations in various parts of their analysis. If this function usesset.seed()
and does not revert it afterwards, then it could mean that the user's subsequent simulations are all based on the same seed even though they usedset.seed()
to change the seed in their code. If you're interested in suggestions on how to address this, I personally like to use the withr R package for stuff like this. For example, instead of using theset.seed()
function, you could use the withr::with_seed()` function. Does that help? -
eval(parse(....))
is rarely a good idea unless you have no alternatives. This is because it can introduces bugs that are hard to track down and it can also cause security problems. In this function, I wonder if you could you could use some combination ofexpression
andsubstitute
to acheive the same result? -
I feel like it would be easier for users to use this function (and maybe others too?) if the parameters were re-ordered such that all parameters without defaults came before those with defaults. For example, to run the
get_biomass()
function with default parameters and without using named arguments, I would have to do something likeget_biomasss(x, y, , z)
instead ofget_biomass(x, y, z)
(to specify arguments fordbh
,genus
, andcoords
). But maybe there's a reason for this particular order for parameters? Sorry if I'm missing something obvious.
Please let me know if anything I've written isn't clear or doesn't make sense, and I can follow up with more details? I could also be entirely wrong about some things too, so please do correct me if I'm missing something obvious :)
I'll try again with the bot. I'm new here, please bear with me.
@ropensci-review-bot add @jeffreyhanson to reviewers.
@ropensci-review-bot add @jeffreyhanson to reviewers.
@ropensci-review-bot add @jeffreyhanson to reviewers
@jeffreyhanson added to the reviewers list. Review due date is 2021-06-14. Thanks @jeffreyhanson for accepting to review! Please refer to our reviewer guide.
Hello @jeffreyhanson, your suggestions are manageable, thanks for catching all that. We will work on those and will get back to you if we have questions. Will try to have things solved by 5/31 so you can proceed with a through review.
Hi @gonzalezeb, ok, that sounds great - thank you!
Just an update @gonzalezeb, I’m still looking for a second reviewer, sorry about any delays with this.
@ropensci-review-bot add @jstillh to reviewers
Thank you for accepting the review for allodb, Jonas. Please let me know if I can assist you with anything along the way.
Hi @jeffreyhanson, we solved many of the issues you pointed out and may justify a few later. Hope you can go ahead with reviewing the functionality of the package. Thanks!
Hi @gonzalezeb
Here is my review. I really appreciate the package and the work you & the co-authors have put into developing it.
I think the package will be useful for a lot (if not every) forest ecologist... I reviewed the package looking at it as a regular user - sorry if i misunderstood some of the components of your package.
Package Review
- Besides Kristina Anderson Texeira and myself being listed as co-authors on a paper with 100+ authors i don’t have any professional relationship with the authors of the package
- As the reviewer I confirm that there are no conflicts of interest for me to
review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- A statement of need clearly stating problems the software is designed to solve and its target audience in README
- Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s) demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions
- Examples (that run successfully locally) for all exported functions
- Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Estimated hours spent reviewing: 9
- Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.
Review Comments
The authors provide a package allowing to calculate biomass for individual trees of extratropical tree and shrub species. A package including biomass equations for extratropical species is a useful and timely complement to the BIOMASS
package by Chave et al. for tropical species. The authors apparently developed the package focusing on the plots of the ForestGEO network. Given that the authors aim at providing a package complementing the BIOMASS
package, i reviewed the functionality of the package not only for the use on single/multiple sites but for the use along latitudinal gradients and focused on European tree species that i am familiar with.
Although i appreciate the general idea of the package and the large effort that the authors put into its development, i have some concerns regarding the concept applied when calculating biomass and some general comments on code and documentation that i feel should be addressed:
Conceptual
The core function get_biomass
provides the user with one estimate of aboveground biomass based on the diameter at breast height (dbh) of a tree. This function then applies one or several of the biomass equations based on climatic similarity and similarity between species. The estimate is based on a large set of biomass-equations compiled from a wide range of sources - i very much appreciate the work the authors have put into the compilation of this dataset.
I like the general framework of this package and the option to add additional equations via the function new_equations
but i have concerns regarding the workflow implemented in the package. These mainly focus on the following points:
-
Documentation of selection and weighting
The functionget_biomass
returns avector
with an estimate of the biomass for each tree. The calculation of this value is somewhat opaque as long as the user does not actively change some of the arguments (which by default do not need to be provided) of theget_biomass
function. Even if species / genus specific equations are available, this function includes additional, less specific equations. These are given a certain weight based on similarities using theweight_allom
function and might contribute little to the final estimate. However, this is not obvious to the user, as the package only provides one value for each tree. I would appreciate if the functionget_biomass
would not only return a vector but more metadata on the steps that resulted in the calculation of the final result. This could be achieved by returning a list containing the final values, all values with the associated weights, the output ofnew_eqtable
documenting the equations used and the output ofweight_allom
.
I understand that the functionillustrate_allodb
should give the user an impression on the results of the resampling - but it does not provide the user with all information (e.g., no information on weights) and it requires the user to run an additional function. -
Inclusion of equations based on climatic simlilarity
To check for climatic similarity, the package makes use of the Koeppen classification. This is a feasible approach - but it has
some drawbacks in the implementation: Based on the Koeppen classification of the site (provided through thecoords
argument), a certain set of equations is used. This can result in the exclusion/ downweighting even of species specific biomass equations. The illustrative example below provides biomass estimates for Fagus sylvatica along a latitudinal gradient from Northern Italy to Northern Germany and the output ofillustrate_allodb
for two sites used in this latitudinal gradient. Obviously a completely different set of equations is used - in the second site, no species specific
equations are used. (Btw: there seems to be a typo, i guess it should be Populus instead of Pupulus.)
The differences between the sites observed here are mostly driven by a change in the Koeppen zones and a different set of equations applied due to the change in Koeppen zones. I am not convinced that
this makes ecologically sense. Additionally, the Koeppen-classification for specific equations seems to be based on
the value for longitude and latitude provided in the original publication of each biomass equation. However, for equations
covering a wider range (e.g., the equations published in Forrester et al. 2017), the authors seem to have extracted the Koeppen value for the mean of both the lat and long range provided. This may result in a very specific classification not representative for the species and a subsequent down-weighting of this equation when applied in the midst of the species’s distribution range: For the ForestGeo Plot in Zofin, which is mainly covered with beech and lies in the center of the distribution range of this species, this results in a down-weighting of the equation provided in Forrester et al. 2017 as the valuewE
provided in the dataset for this specific combination inkoppenMatrix
is 0.7. I did not check if other equations have a higher weight, but i think this issue should be addressed by the authors.
If the Koeppen classification is not included in the datasetkoeppenMatrix
or the specific combination if Koeppen zones is
included with a weight of 0, this results in the exclusion of a specific equation and a subsequent error message (if only one
equation is provided) or the inclusion of other, not species specific equations, see my comments further down.
In general, the weighting concept of the different biomass equations should be more transparent for the package user and probably requires conceptual changes.
data <- allodb::get_biomass(rep(50, 41), rep("Fagus", 41), rep("sylvatica", 41), coords = as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))))
rcoords_site <- round(as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))) * 2 - 0.5)/2 + 0.25
koeppen <- apply(rcoords_site, 1, function(xk) {
subset(kgc::climatezones, Lon == xk[1] & Lat == xk[2])$Cls
})
plot(seq(44, 54, 0.25), data, xlab = "lat", ylab = "biomass [kg]", ylim = c(0, 3000), pch = 21, cex = 1.5, bg = koeppen)
![unnamed-chunk-1-1](https://user-images.githubusercontent.com/31648011/121906533-7da22c00-cd2b-11eb-97fe-fe8d6a432e5f.png)
allodb::illustrate_allodb("Fagus" , coords = c(10, 44), "sylvatica")
allodb::illustrate_allodb("Fagus" , coords = c(10, 46), "sylvatica")
data("koppenMatrix", package = "allodb")
data("sites_info", package = "allodb")
koppenMatrix$wE[koppenMatrix$zone1 == sites_info$koppen[sites_info$site == "zofin"] & koppenMatrix$zone2 == "Csb"]
## [1] 0.7
Code and documentation
Use of eval(parse())
As @jeffreyhanson already pointed out here, the use of eval(parse())
is not good practice for the reasons he laid out nicely in his comment. You seem to have replaced the eval(parse())
in this function but then changed it back according to this issue because the str2lang
that you used to replace eval(parse())
is not available for users with R < 3.6.0.
I was not aware of the str2lang
function and i am not convinced that using this function will solve the issue raised by @jeffreyhanson: The issue with tracing bugs will still exist even when changing to the str2lang
function and the general security concerns associated with the use eval(parse())
can probably not be solved with replacing the parse
function by str2lang
as str2lang
is only a special case of parse
.
The concerns raised are associated with the combination of eval(parse())
and not with the use parse()
. (btw: when you removed str2lang
in this function, you only removed it on line 93 but not within the else
statement on line 101).
I think working along the lines laid out by @jeffreyhanson (i.e. using expression
and substitute
) would be a good option. Another option would be to create internal functions for all forms of the equations in dfequation
and then apply these.
Documentation
I think the vignette should be improved. It would be great if you could provide some more information to the user especially on the estimation of the parameters in est_param
and what effect the weighting in weight_allom
has on the result.
As @jeffreyhanson already pointed out here, consistently using code formatting throughout the documentation would be great (e.g., “numeric” in the description of the resample_agb
function could be change to numeric
).
Naming conventions
The use of the package could be improved if you consistently order the arguments and stick to one naming convention throughout the package. sitespecies
does not follow the naming convention you used for the datasets in the package - all other datasets are named following the convention name1_name2. This applies for the dataset koeppenMatrix
as well.
Performance
The calculation of the biomass is relatively slow if for every tree a unique coordinate is provided - i guess this is caused by the subset operations performed in the weight_allom
function. Given that a user might provide individual coordinates for each tree on a site and the coarse resolution of the koeppen classification, it might be an option to first check if the coordinates vary in such a way that these need to be subset individually or if a single coordinate can be used.
Such performance considerations are necessary, as the calculations are rather slow at the moment. For example, the biomass calculation for 100 trees with tree level coordinates require around 9s.
system.time(allodb::get_biomass(dbh = c(1:100), genus = rep("Fagus", 100)
, species = rep("sylvatica", 100)
, coords = as.matrix(cbind(rep(10, 100), sample(seq(48.001, 48.1, 0.0001), 100)))))
## user system elapsed
## 8.81 0.03 8.86
Specific comments
get_biomass
The get_biomass
function creates a biomass estimate based on dbh
, genus
, coords
and species
. It allows the user to provide a selection of the biomass equations via the argument new_eqtable
. This seems to be an important feature of this function. However, if the values provided in the column koppen
do not appear in kgc::climatezones
, the function throws an error and the error message does not point the user to the reason of the error - see my specific comment on the weight_allom
function below. Given the coarse resolution of the kgc::climatezones
dataset of 0.5 degrees, this might be problematic especially in mountainous regions with a large altitudinal gradient. Additionally, this is an issue which might prevent users from applying equations that could be applied for the respective region - see my general comment above.
equations_to_use <- allodb::new_equations(subset_taxa = "Fagus")
allodb::get_biomass(dbh = 20
, genus = "Fagus"
, species = "sylvatica"
, coords = c(10, 46)
, new_eqtable = equations_to_use)
## Warning: Unknown or uninitialised column: `equation_id`.
## Error in if (any(nEQ <- vNms != make.names(vNms))) vNms[nEQ] <- paste0("`", : missing value where TRUE/FALSE needed
allodb::get_biomass(dbh = 20
, genus = "Fagus"
, species = "sylvatica"
, coords = c(10, 46))
## [1] 217.0154
weight_allom
The weight_allom
function creates weights for all biomass functions which will later be used in the calculation of the biomass.
Beyond the concerns i already mentioned above, i only have a minor comment on this function: The kgc::climatezones
dataset does not cover large bodies of water. The values in dfequation$wE
will therefore be -Inf
for all coords
that are not covered by kgc::climatezones
The weight_allom
function therefore returns -Inf
causing the function to stop without a easily understandable error message. I would appreciate if you either add an error message like if(dfequation$wE == -Inf){stop('Error message')}
or if you would just flag the respective values so they are omitted in later calculations.
new_equations
The function new equations
allows the user to a) subset to a specific set of equations for one or several taxa and b) to add additional equations. I appreciate the latter option, as this adds a lot of flexibility to the package.
Thanks, @jstillh, for your quick and comprehensive review!
@jeffreyhanson, are you able to provide your review soon? It was due on Monday. Please let me know if you need more time.
Oh - I'm so sorry! I mistakenly thought my reivew would be due later since @gonzalezeb was working on updating the package - I'll submit review by the end of the week. I hope that doesn't cause too many problems for you.
@jeffreyhanson, all good, I understand the confusion as I obviously interpreted it to be due on the original date. Take all the time that you need.
Package Review
- Briefly describe any working relationship you have (had) with the package authors.
- As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- A statement of need clearly stating problems the software is designed to solve and its target audience in README
- Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s) demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions
- Examples (that run successfully locally) for all exported functions
- Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine. - Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Estimated hours spent reviewing: 4
- Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
The allodb R package provides a framework for estimating tree biomass based on allometric equations. Overall, my impression is that the package is very well designed. I think the authors have done a fantastic job of providing a highly flexible interface -- allowing users to customize calculations as needed -- that is easy to use. Since I have zero expertise in forestry science, I cannot comment on the novelty, scope, or methodological correctness of the package. As such, I have limited my review to focus on code and documentation.
I previously posted several suggestions for improving the package. Although I am glad that some of these suggestions were used to help improve the package, I would appreciate hearing why some of these suggestions were not incorporated? It's entirely possible that these suggestions were irrelevant, infeasible, or undesireable. For example, is there some particular benefit for not describing classes precisely, or not validating user inputs of which I am unaware? Below I've listed some specific comments. Please let me know if anything I have written is unclear, irrelevant, or incorrect?
Code
-
I mentioned earlier that I thought it would be helpful to include some code to validate function arguments. From what I can tell, the code hasn't been updated to include such validation - is that correct? I'm sorry, was my previous comment unclear, or did I misunderstand something? To help show what I mean, here's some example code that could be used to validate arguments for the
est_params()
function. Please note that I'm not saying you have to use this specific code, I'm just trying to provide an example of what the code could look like (e.g. you might prefer to use the checkmate R package instead):est_params <- function(genus, coords, species = NULL, new_eqtable = NULL, wna = 0.1, w95 = 500, nres = 1e4 ) { # assert arguments are valid assertthat::assert_that( ## genus is.character(genus), assertthat::noNA(genus), ## coords is.numeric(coords), assertthat::noNA(coords), length(coords) == 2 ### ensure valid longitude isTRUE(coords[1] >= -180), isTRUE(coords[1] <= 180), ### ensure valid latitude isTRUE(coords[2] >= -90), isTRUE(coords[2] <= 90), ## species inhrerits(species, c("character", "NULL")), ## new_eqtable inhrerits(new_eqtable, c("data.frame", "NULL")), ## wna assertthat::is.number(wna), assertthat::noNA(wna), ## w95 assertthat::is.number(w95), assertthat::noNA(w95), ## nres assertthat::is.count(nres), assertthat::noNA(nres) )
-
I previously suggested that it would be good to avoid using
eval(parse(...))
if possible. Looking through the commit history, it seems like an alternative function was attempted (i.e.str2lang(...)
) but this was later removed to avoid compatibility issues with older R versions. Thanks for looking into this! I appreciate these efforts of trying to get it working. I was trying to think of alternative strategies to avoideval(parse(...))
, but I think the over-arching issue is that some equivalent is needed because the equations are (ultimately) stored ascharacter
objects (e.g. something like"log(dbh) * 5"
). I suppose some alternatives would be to update theequations
data frame to replace thecharacter
objects withfunction
objects, or replacing theequations
data frame with alist
of purpose-built equation objects that can be used to perform the calculations. However, these seem needlessly complex compared to the current implementation? Maybe this could be one instance whereeval(parse(...))
is not bad idea?To help avoid potential issues with
eval(parse(...))
, maybe it would be worth (1) writing a function that can validate if a givencharacter
is a valid equation or not (it could give a useful error message if the function is not valid), (2) adding this validation function to the other functions that accept equation tables as inputs, and (3) adding unit tests to validate all built-in equation (i.e. specified in theequation
table) using this new function? I'm sorry that I can't be of more help. -
The package contains functions that output different data frame classes. For example, the
resample_agb()
function outputs atibble
(well,tbl_df
object), and theest_params()
function outputs adata.table
object. Since these functions output different classes, this means that the user will need to keep remember exactly what data frame class is output from each function -- increasing the cognitive load required to use the package. In other words, this behavior breaks the "law of least surprise". Is there a particular reason for this? I'm sorry if I'm missing something obvious?To help make the package easier to use, may I suggest picking one data frame class (e.g.
data.frame
,tibble
ordata.table
) and using that as the basis for all user-facing functions? This way the user can simply remember something like "The allodb works withdata.table
objects, so I need to make sure my input data are in that format, and convert any outputs from that format as needed". Indeed, if there are certain reasons for using certain data frame classes in certain functions (such as for performance), perhaps different classes could be used internally (e.g. a function gets adata.frame
as input, converts it to adata.table
for performance, does some processing, and then outputs adata.frame
)? What do you think? I'm just trying to help suggest ways to make the package easier to use. -
The
nls()
function used inest_params()
is exceptionally useful for fitting nonlinear models. Since it can experience convergence issues (e.g. see here, here, here, and here), it is sometimes necessary to customize the default settings to obtain reliable outputs (e.g. change optimizer, increase number of iterations). Given this, would it be possible to update theest_params()
function so that users can customize the optimization parameters (e.g. specify thealgorithm
andcontrol
parameters fornls()
, see here for details)? It might also be useful if the function could somehow output information on the performance of the models too (e.g. model goodness of fit statistics)? But maybe I'm missing something and these concerns are not relevant here? What do you think?
Documentation
-
I mentioned earlier that I thought the documentation could be improved by using more precise terminology to describe function parameters and outputs. For example, using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Also, specifying exactly what data frame class (e.g.
data.frame
ordata.table
) when referring to R objects (e.g. here). From what I can tell, the documentation hasn't been updated based on this suggestion - is that right? I'm sorry if my previous comment wasn't clear? Or maybe I'm misunderstanding something? If so, could you please explain why the current terminology is preferable? -
It would be difficult for me to understand exactly how the
get_biomass
,est_params
, andresample_agb
functions work based only on reading the documentation. I wonder if it is possible to provide more details on the underlying calculations in a Details section? This suggestion may not be relevant though, because I might lack the required experience in forestry science to understand the documentation in its current form? -
I mentioned earlier that I thought the documentation could be improved by using code tags to specify that certain words correspond to object classes. From what I can tell, it seems these tags are still missing from a few places (e.g. like "NULL" here). Could you please double check the documentation and add in any missing tags? Also, when documenting character objects, I would suggest including quotation marks (e.g. using
"all"
instead ofall
here) so it's clear the documentation is referring to a character object. -
Have you thought about including more links to help the user navigate the documentation (e.g. here where the documentation refers the reader to another manual entry, or here where the documentation refers to another function)? Perhaps the
@seealso
section could also be used to help users navigate between related functions? -
Have you considered using equation commands (e.g.
\eqn{}
,deqn{}
) in the documentation to format equations (e.g. such as this equation; see here for example)? -
It might be helpful to use the
\pkg{}
command when referring to the name of packages in the documentation (e.g. such as\pkg{allodb}
in here)? -
Is it possible to explicitly state the type of object in the documentation for each dataset? I appreciate that the documentation says that the objects are a "data frame", but it would be useful to precise what implementation (e.g.
data.frame
,data.table
,table
,tibble
objects are all of "data frame" objects). -
It might be helpful to add example sections to the documentation for the built-in datsets so that the reader can see a preview of each dataset. What do you think? For example, in data.R, you could insert some code like this:
#' @examples #' # preview the dataset #' print(head(equations)) #'
-
I also noticed some potential typos in the documentation: here, here.
Built-in datsets
- I noticed that some of these objects are
tibble
objects (e.g.equations
) and others aredata.frame
objects (e.g.genus_family
). I was wondering if it would be useful to have all built-in datasets represented using the same class (e.g. all astibble
objects)? This is becausetibble
anddata.frame
objects have markedly different behaviors -- even though atibble
inherits from adata.frame
-- and these differences can cause errors when the user/code expects a different behavior. This is related to the "law of least surprise" that I mentioned earlier.
Minor suggestions
@jstillh and @jeffreyhanson, thank you so much for your thoughtful and kind reviews. Your suggestions will improve the package 100%. I will contact you if we have questions, we truly appreciate your time on this.
Hi @gonzalezeb, just following up, it's been four weeks since the reviews came in. If you are experiencing any difficulties or issues, feel free to flag them with the reviewers or me and we'll do our best to assist you through the process.
Hi @adamhsparks. Sorry, I am a little behind with this and have solved only minor issues. There are some comments on the conceptual aspects of the package (i.e. here) where I get the point but changing our approach will require some (lot's!) of work, so I am giving it a careful thought. How should I tackle issues like those?
Hi @gonzalezeb, which specific aspect were you linking to and asking for guidance on? Your link only links to the whole issue here, not one of the reviewers' comments or issue that they may have opened in your repository?
Hi @gonzalezeb, do you need more time to work on this? We can put a holding tag on it and revisit it after three months to see where things are for up to a year. Would that be useful to you?
Hi @adamhsparks, thanks for checking. We have made some substantial progress but I still need to work in some minor stuff. The plan is to finish this before Sep 15 as we want to submit our revised manuscript to MEE also by then. Will that be ok with you all?
Sounds good, Erika! Thanks for updating me.
Hi @adamhsparks, I wasn’t sure how to respond to the multiples reviews so I hope the following is sufficient. We are very grateful for the thoughtful suggestions by reviewers, and have implemented most.
Initial suggestions by @jeffreyhanson (5/19/21), have been addressed:
- The package website is working (https://forestgeo.github.io/allodb/)
- Documentation for multiple functions has been improved by using code formatting (eg. using ‘numeric vector’ instead of ‘numerical vector’, added (
tibble::tibble()
object), etc). - Typos have been corrected
- Unnecessary R and non-R script files have been removed (eg. R/sysdata.rda)
- We agree with the reviewer. If a user passes a bad argument, it is best to fail fast with an informative message so that the user can quickly correct their mistake and move on. But the number of possible mistakes may be infinite; so we have particularly focused on adding an internal function
validate_equations()
to tests that all equations inequations$equation_allometry
are a valid R expression of the dbh or h variables (since this could be one of the main mistakes on calculating biomass). This function can be called from inside any function withinallodb
and allows users to pass a character vector of equations. We also added consistency checks in the get_biomass function of dbh and coordinates provided by the user. - Unused Travis badge in REAMDE has been removed
- The .buildignore folder has been removed
- We changed
set.seed
withwithr::with_seed
in theresample_agb
function. - About
eval(parse(…))
, see next comment. - Parameters for many functions have been reordered.
Comprehensive review by @jeffreyhanson (6/17/21), has also been addressed:
About code:
- See note number 5 above. If the suggested change is truly necessary (validation of function arguments), then we may take the time to implement.
- All dataframes now have the
tibble
subclass.
About documentation: - Documentation has been improved for all functions (specially the
get_biomass
andweight_allom
functions), added@seealso
to help users navigate btw documentations, use quotation marks for character vectors, added example sections to the documentation for the built-in datasets, etc. - We are now using \pkg{allodb} in documentation when referring to this package, and equations commands (eg. \deqn{}).
- Removed library(allodb) from all test scripts.
Suggestions by @jstillh (6/14/21), have been addressed:
- Documentation for both the
get_biomass
andweight_allom
core functions have been improved so the weighting concept is more transparent, specifically we give more details in the Details section. Regarding adding additional outputs to theget_biomass
function, we think that would make the function too slow. - We improved the error message within the
weight_allom
function (i.e. now it deals with coordinates that are not associated with any koppen climate zone) to address the specific comments on the fact that the get_biomass function throws an error when the equation table does not contain equations corresponding to the Koppen climate at the provided coordinates (argument coords). The proposed solution is to change koppenMatrix, replacing climate combinations with weight we=0 (2 completely different climates) to we = 1e-6, similar to what is done with the taxonomic weight. This way, equations with we = 1e-6 will only be significantly used in the recalibrated equations when no better climate combination is found, therefore avoid getting an error. In regards to Forrester et al 2017 equations, we revised them and now many coordinates correspond to max or min values given in original publication (not the mean except for one site). - Vignette has been improved.
- About eval(parse(…)), also see next comment.
The reviewers advice to avoid the call eval(parse(text = ...))
(item 9 here, and “Code and documentation” here), mainly because (a) it can be unsafe, and (b) error prone.
Instead they suggest storing the equations as objects such as expressions or functions, and manipulate them with substitute()
.
We thank the advice and, after considering it carefully, we would like to share our updated opinion. We first discuss the problems of eval(parse())
and then the alternative solutions.
To make our discussion concrete we offer some code examples.
library(dplyr, warn.conflicts = FALSE)
library(allodb)
tl;dr
- The use of
eval(parse())
as its problems seem to be balanced by the benefits we see in this particular use case, and the alternative solutions seem too complex. This is in line with @jeffreyhanson’s comment on on Jun
17:
I previously suggested that it would be good to avoid using
eval(parse(...))
if possible. (…) However, [the alternatives] seem
needlessly complex compared to the current implementation? Maybe this could be one instance whereeval(parse(...))
is not bad idea?
- We are now testing the equations are correct or will trigger an informative error message.
Unsafe
Online we found that parsing text would be a risk if we hosted an application on a server we own and allow users to input arbitrary text. A malicious user could damage our server.
But this is not our case. The allodb package is to be used as an analysis tool, directly on the users’s computers. And the input text comes mainly not from users but from a dataset we built.
Error prone
We acknowledge that bugs can hide in the unevaluated text, as invalid code is still a valid text object.
valid <- "dbh + 1"
eval(parse(text = valid), envir = list(dbh = 1))
#> [1] 2
invalid <- "1 <- dbh"
try(eval(parse(text = invalid), envir = list(dbh = 1)))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment
But bugs can also hide in unevaluated expressions. Using substitute()
to manipulate expressions does not ensure an expression is valid.
invalid <- quote(1 <- dbh)
substitute(x, list(x = invalid))
#> 1 <- dbh
try(eval(substitute(x, list(x = invalid))))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment
We are comfortable storing and manipulating text. We can inspect equations stored as text directly from the dataframe, and we can manipulate them with familiar tools.
# Storing text
equation <- "1.2927 * (dbh^2)^1.36723"
dbh <- "(sampled_dbh * 0.393701)"
# + We can inspect the equations directly from the dataframe
tibble(equation, dbh)
#> # A tibble: 1 x 2
#> equation dbh
#> <chr> <chr>
#> 1 1.2927 * (dbh^2)^1.36723 (sampled_dbh * 0.393701)
# - To manipulate the equations we use familiar tools
modified <- gsub("dbh", dbh, equation)
modified
#> [1] "1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723"
eval(parse(text = modified), list(sampled_dbh = 1))
#> [1] 0.1010408
In contrast, we are not comfortable storing or manipulating expressions with substitute()
. We found great help in the first edition of Advanced R but we still think the approach is difficult to understand, which makes us more likely to introduce bugs than with eval(parse())
.
# We need to create an escape hatch because substitute() doesn't work if we
# already have an expression saved in a variable
# http://adv-r.had.co.nz/Computing-on-the-language.html#substitute
substitute_q <- function(x, env) {
call <- substitute(substitute(y, env), list(y = x))
eval(call)
}
equation <- quote(1.2927 * (dbh^2)^1.36723)
dbh <- quote((sampled_dbh * 0.393701))
# - We can't inspect the equations directly from the dataframe :-(
tibble(equation = list(equation), dbh = list(dbh))
#> # A tibble: 1 x 2
#> equation dbh
#> <list> <list>
#> 1 <language> <language>
# - To manipulate the equations we need unfamiliar tools
modified <- substitute_q(equation, list(dbh = dbh))
modified
#> 1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723
eval(modified, list(sampled_dbh = 1))
#> [1] 0.1010408
As @jeffreyhanson advised, instead of using expressions and substitute()
we prefer to ensure all text-equations are valid or throw an informative error message.
validate_equation <- function(text, envir = parent.frame()) {
tryCatch(
eval(parse(text = text), envir = envir),
error = function(e) {
stop("`text` must be valid R code.\nx Invalid: ", text, call. = FALSE)
}
)
invisible(text)
}
dbh <- 99
validate_equation(valid)
try(validate_equation(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh
f <- function(text) {
validate_equation(text, envir = list(dbh = 1))
# Some useful code
text
}
f(valid)
#> [1] "dbh + 1"
try(f(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh
@gonzalezeb, thank you for your comprehensive overview of the changes that you have made to the allodb package.
@jeffreyhanson and @jstillh could you please let me know if you are satisfied with the changes to the package or have any further suggestions that may improve it?
Awesome - thank you very much @gonzalezeb. @adamhsparks, I'll try and take get back to you next week.
@gonzalezeb thanks a lot for the overview of the changes - i'll try to go through them in the next week as well and get back to you, @adamhsparks .
Thank you @gonzalezeb for providing such detailed responses to the points I raised earlier. I especially found your discussion on the costs and benefits of eval(parse(...))
really useful. I think the R package is looking absolutely brilliant! Most of the points I raised earlier have been addressed. I've made a couple of suggestions below that I think could help improve package.1 All things considered, these are pretty minor - just some documentation tweaks. If those could be addressed, I'd say that all my suggestions have been satisfied. I've also listed several additional minor comments too. Based on my understanding of rOpenSci guidelines, they won't/shouldn't influence whether the package should be accepted or not (but please correct me if I am wrong). I've listed them just to be thorough, and in case the package authors think they are worth addressing.
Suggestions
-
Package checks throw the following warning on my computer (i.e. running
devtools::check()
). Could you please fix it? Let me know if you're unable to reproduce this error and I can try to provide more information?Undocumented code objects:
‘equations’
Undocumented data sets:
‘equations’
All user-level objects in a package should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual. -
Thank you very much for looking into potential issues with using
eval(parse(...))
and potential alternatives. I agree that the main issue would be if the code was run inside a web application that allowed individuals to enter in arbitrary text. Although additional issues could arise on standard desktop/laptop computers too (e.g. a malicious actor could potentially update the library files), I guess at this point malicious actors would have easier ways to accomplish their goals then modifying R library files. So, given the costs and benefits outlined by @gonzalezeb, I think it's perfectly reasonable for the package to useeval(paste(...))
. Maybe it would be worth adding some warning text to the documentation for the function(s) that useeval(parse(...))
so that users are aware of potential issues? This way it is clearly the responsibility of the user to ensure that they take appropriate steps to avoid security issues.2Also, I noticed one of the links provided by @gonzalezeb mentioned that
RAppArmor::eval.secure()
can prevent system calls with administrator privileges. If it's not too much hassle, would it be worth using that where possible? It seems that this R package is only available on Linux though - so perhaps it's not worth the hassle (because then some additional code would be needed to conditionally useRAppArmor::eval.secure()
if it's available, and otherwise useeval(parse(...))
). I just thought I'd mention it just in case. -
The README.md file might need re-knitting because the output displayed for
est_params()
is adata.table
object (instead of atibble()
object per the current version).
Minor comments
-
The documentation doesn't seem to format code correctly (i.e. using back ticks). For example, NULL is referred to as "NULL" instead of
NULL
(e.g., see here, or here). Personally, I find syntax formatting helpful when reading documentation (e.g., https://readr.tidyverse.org/reference/read_delim.html). I don't think rOpensci guidelines have style mandates for code in package documentation, so I don't think this is critically important. -
The main functions (e.g.
est_params()
,get_biomass()
,resample_agb()
) don't contain code to validate all arguments. Although @gonzalezeb mention that "the number of possible mistakes may be infinite", I would suggest that perhaps the number of expectations (e.g. properties or characteristics) for a valid input is much lower. For example, in order to verify that an argument is a single integer (e.g.5
), one does not need to verify that the parameter is (i) not a character, (ii) not aNULL
, (iii) not a double, (Inf) etc. Instead, one could verify that it conforms to a set of expectations (e.g.inherits(x, "integer") && identical(length(x), 1) && all(is.finite(x))
). This is made much easier using existing R packages developed for this purpose though. However, I don't think rOpensci or CRAN require argument validation for published packages, so I don't think this issue critically important. I'm just making this comment to be thorough :) -
This is a super minor suggestion. The equation in the documentation for the
est_params()
function has the text " with " formatted as a maths term (rather than text), which makes it slightly difficult to read on the package website. To address this, I think you could change this line to define the equation as:
\deqn{AGB = a * dbh ^ b + e, \space \mathit{with} \space e ~ N(0, sigma^2)}{AGB = a * dbh ^ b + e, with e ~ N(0, sigma^2)}
1 Please note that I still (unsurprisingly) lack the forestry experience needed to evaluate the novelty and methodology of this package, and so continue to limit my reviews to code quality and documentation.
2 Just to be clear, I do think that -- where possible, feasible, and reasonable -- R package developers should ideally -- if only in principle -- minimize security risks, even if software licenses absolve developers of responsibility. It's just I think that eval(paste(..))
is not so terrible here, given the noted trade-offs that are specific to this package. Perhaps other will disagree, and that's fine. Please note that I do not have any experience with cybersecurity. If you want an informed view point on such matters, I suggest you talk to someone else :)
Thanks @jeffreyhanson, I have adopted most of your comments, they are very helpful!
Thank you @gonzalezeb for the very detailed response to the reviews. As @jeffreyhanson, I like your detailed response regarding the use of eval(parse(...))
. In general, i am very pleased with the revisions and the changes you implemented in the functions. I do not have any more suggestions beyond the ones by @jeffreyhanson.
- I appreciate the changes / improvements in the documentation of both the
weight_allom
and theget_biomass
function and that you implemented an informative error message if coordinates are not covered by the koppen data. - Your proposition regarding the weighting of the climatic combinations sounds like a good idea.
Thanks for this contribution.
@ropensci-review-bot approve
Could not approve. Please, specify the name of the package.
@ropensci-review-bot approve allodb
Approved! Thanks @gonzalezeb for submitting and @jeffreyhanson, @jstillh for your reviews! 😁
To-dos:
- Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
- Fix all links to the GitHub repo to point to the repo under the ropensci organization.
- Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
- If you already had a
pkgdown
website and are ok relying only on rOpenSci central docs building and branding,- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current
pkgdown
website with a redirecting page - replace your package docs URL with
https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the
URL
field alongside the link to the GitHub repository, e.g.:URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
- Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname)
. If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with GitHub Actions) - Please check you have updated the package version to a post-review version and that you documented all changes in NEWS.md
- We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running
codemetar::write_codemeta()
in the root of your package.
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent). More info on this here.
Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.
We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.
Last but not least, you can volunteer as a reviewer via filling a short form.