Every quadrat in the dataset must have at lease one tree
maurolepore opened this issue · 15 comments
@srusso2 and @DanielZuleta,
Small tree tables used to err because
Torspstcnthab / Tortotstcnthab = NaN
(edited, thanks @srusso2 for noticing the mistake via #40 (comment))
Now they no longer err because I'm now converting NaN
to 0
with a warning. Is this OK?
(I know that users "should only try to determine the habitat association for sufficiently abundant species" but we lack a formal definition of sufficiently abundant that I can use to programatically reject bad inputs.)
The warning is the one right below, and the code that implements this warning is here and here (see details in the example below):
#> Warning: Using zero (`0`) where the relative stem density of focal species
#> per habitat of the focal torus-based map can't be calculated
#> because `Tortotstcnthab` and `Torspstcnthab` are zero:
#> * `Tortotstcnthab` determines total number of stems per habitat of the
#> focal torus-based map.
#> * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#> the focal torus-based map.
With large tree table throws everything is OK
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(fgeo))
tree <- fgeo.x::tree6_3species
autoplot(sp(tree))
tt_test(tree, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> [[1]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> CASARB 29 1242 356 2 0 0.77625
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> CASARB 20 390 1206 4 0 0.24375
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> CASARB 12 778 817 5 0 0.48625
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> CASARB 5 932 658 10 0 0.5825
#> attr(,"fixed_nan")
#> [1] FALSE
#>
#> [[2]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> PREMON 91 1093 504 3 0 0.683125
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> PREMON 89 1254 344 2 0 0.78375
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> PREMON 40 305 1292 3 0 0.190625
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> PREMON 14 270 1322 8 0 0.16875
#> attr(,"fixed_nan")
#> [1] FALSE
#>
#> [[3]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> SLOBER 18 273 1324 3 0 0.170625
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> SLOBER 24 810 788 2 0 0.50625
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> SLOBER 17 1155 440 5 0 0.721875
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> SLOBER 7 1292 303 5 0 0.8075
#> attr(,"fixed_nan")
#> [1] FALSE
With small tree table (throws warning)
# Used to err because `Tortotstcnthab / Torspstcnthab = NaN`
smaller_tree <- sample_n(tree, 50)
autoplot(sp(smaller_tree))
# Now no longer errs because I'm now converting `NaN` to `0` with a warning.
# Is this OK?
tt_test(smaller_tree, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Warning: Using zero (`0`) where the relative stem density of focal species
#> per habitat of the focal torus-based map can't be calculated
#> because `Tortotstcnthab` and `Torspstcnthab` are zero:
#> * `Tortotstcnthab` determines total number of stems per habitat of the
#> focal torus-based map.
#> * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#> the focal torus-based map.
#> [[1]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> SLOBER 4 1021 544 35 0 0.638125
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> SLOBER 4 1120 408 72 0 0.7
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> SLOBER 0 0 1054 546 -1 0
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> SLOBER 0 0 676 924 -1 0
#> attr(,"fixed_nan")
#> [1] TRUE
#>
#> [[2]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> PREMON 9 388 1179 33 0 0.2425
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> PREMON 10 487 1072 41 0 0.304375
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> PREMON 6 1318 236 46 0 0.82375
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> PREMON 4 1117 0 483 0 0.698125
#> attr(,"fixed_nan")
#> [1] TRUE
#>
#> [[3]]
#> N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> CASARB 3 1154 394 52 0 0.72125
#> N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> CASARB 2 654 878 68 0 0.40875
#> N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> CASARB 1 861 664 75 0 0.538125
#> N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> CASARB 0 0 683 917 -1 0
#> attr(,"fixed_nan")
#> [1] TRUE
Created on 2018-12-31 by the reprex package (v0.2.1)
Hi @maurolepore ,
(cc @srusso2)
Sorry for my late response.
I don't think that's the best solution to this issue because when you convert these NaN
to 0
, the values for the GrLsEq
object are affected. The GrLsEq
object contains the "greater than", "less than", and "equal to" counts employed to test if the species is repelled, aggregated or neutrally distributed on each habitat. In other words, despite the warning, the main output of the function is affected, which has serious implications for the inferences on the species' habitat association. (Mainly in cases in which users do not pay attention to warnings.)
Evidently, in your example, the main conclusions about the habitat associations (Rep.Agg.Neut.n columns) changed when you used the small tree table for the two species that significantly reduced their abundance after using the sample_n(tree, 50)
code (i.e., SLOBER and CASARB).
The root of the problem is, as you stated, the definition of sufficiently abundant species and that's why we recommend employing different sample size thresholds (i.e., species’ abundance) when inferring about the community-wide responses to different types of environmental variation (environmental habitats) (see Zuleta et al. 2018).
I think we can to get rid of these err by converting NaN
to 0
and changing the main results in the Rep.Agg.Neut.n
columns to NA
plus a clear warning message saying that the abundance of that species is too low to calculate its relative tree density per habitat of the torus-based map.
Thanks @DanielZuleta! Great suggestion!
Thanks @srusso2 for noticing that my message at the top was incorrect. I clarify that my mistake is in my message, not in the code -- see below:
I get that a quadrat shouldn't be empty but the example bleow shows that it can. (e.g. What if there is a fire and all trees die?) Handling small, artificial datasets is also useful for examples and tests, given that the funciton takes quite a bit of time to run.
Likely, whatever can happen will eventually happen so I need to figure out a way to handle that possiblity -- even if "to handle it" means to stop and throw an informative error.
I need something actionable. What do you think it's best to do in this situation?
- Stop, return nothing but an error message (Which?).
- Proceed (@DanielZuleta's suggestion),
- return
NA
wherever the result is meaningless - return as many reliable results as possible.
- return a warning.
- return
- Something else?
# Used to err because `Tortotstcnthab / Torspstcnthab = NaN`
library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(fgeo)
#> -- Attaching packages ---------------------------------------------------------------- fgeo 1.1.2.9000 --
#> v fgeo.analyze 1.1.4 v fgeo.tool 1.2.2
#> v fgeo.plot 1.1.3.9000 v fgeo.x 1.1.2
#> -- Conflicts ------------------------------------------------------------------------ fgeo_conflicts() --
#> x fgeo.tool::filter() masks dplyr::filter(), stats::filter()
#> x dplyr::intersect() masks base::intersect()
#> x dplyr::lag() masks stats::lag()
#> x dplyr::setdiff() masks base::setdiff()
#> x dplyr::setequal() masks base::setequal()
#> x dplyr::union() masks base::union()
set.seed(1)
small_dataset <- fgeo.x::tree6_3species %>%
sample_n(50)
autoplot(sp(small_dataset))
Created on 2019-03-22 by the reprex package (v0.2.1)
... I guess there’s a wrapper function in which the user can pick what minimum tree abundance they wish to use and that then subsets the data so that only the species meeting the minimum abundance criterion are run for the TT? If not, then there should be one.
Thanks @srusso2, this is a great idea and I would like to pursue it. But if I understand correctly, the problem of potentially empty quadrats remains. The abundance of a species may be above some threshold, and yet some quadrat may have no tree at all. Maybe we need another check_for_empty_quadrats()
wrapper?
I got it!
The issue is not a given empty quadrat, but empty habitats (set of quadrats) that perfectly match with a habitat of one of the translated maps:
The root of the problem is when both the Tortotstcnthab[i]=0
and the Torspstcnthab[i]=0
, which means that for the given translation (Tor) there are no trees in the focal habitat **[i]**
. Therefore, the problem arises because there is a part of the plot with no trees that coincidentally, perfectly matches with the habitat **[i]**
of the given translated map. That´s why the smaller the tree table i.e., the total number of trees in your database, the higher the probability to get Torspprophab= 0/0 =Nan
.
As @srusso2 said, we can get rid of these NaN
by making sure (from the beginning of the code) that every quadrat of the plot has at least one tree in the dataset. Otherwise, we would have to modify the shape of the plot and, thus, the way of doing the translations.
Awesome @DanielZuleta, good work!
I did some experiments truncating the census and habitat data and suspected that the problem was some kind of missmatch between the habitat and census data. I was unable to articulate it as eloquently as you did. Thanks!
So this is the state of the art:
tt_test()
no longer replaces NaN
with 0
and instead it now aborts. But because in the past few hours I wasn't convinced that the problem was only an empty quadrat, the error message I throw is more complex (see below). With this new insight I should be able to write a much more succint message. If you have a suggestion that I could use verbatim let me know.
This is what we have for now:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(fgeo)
#> -- Attaching packages --------------------------------------------- fgeo 1.1.2.9000 --
#> v fgeo.analyze 1.1.4.9000 v fgeo.tool 1.2.2
#> v fgeo.plot 1.1.3.9000 v fgeo.x 1.1.2
#> -- Conflicts ----------------------------------------------------- fgeo_conflicts() --
#> x fgeo.tool::filter() masks dplyr::filter(), stats::filter()
#> x dplyr::intersect() masks base::intersect()
#> x dplyr::lag() masks stats::lag()
#> x dplyr::setdiff() masks base::setdiff()
#> x dplyr::setequal() masks base::setequal()
#> x dplyr::union() masks base::union()
# Fails
insufficient_density <- fgeo.x::tree5
autoplot(sp(insufficient_density))
habitat <- fgeo.x::habitat
tt_test(insufficient_density, habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error: Can't calculate the relative stem density of focal species per habitat of
#> the focal torus-based map. Is your data sufficiently abundant?
#> `Tortotstcnthab / Torspstcnthab` is not a number (`NaN`).
#> * `Tortotstcnthab` determines total number of stems per habitat of the
#> focal torus-based map.
#> * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#> the focal torus-based map.
#> For more details see https://github.com/forestgeo/fgeo.analyze/issues/40.
Created on 2019-03-22 by the reprex package (v0.2.1)
Actually non-zero number/0 = Inf
never is going to happen because the numerator is the number of trees of the focal species in that translated habitat. So, if the numerator is x=non-zero number
, there has to be (mandatory) at least the same x=non-zero number
in the denominator (the total number of trees of all the species in that habitat)
With what both of you just added, I'll tweak the code to err not when the division results in NaN
but simply when the denominator is zero.
- I refined the error message.
- I changed the title of this issue to match the error message if anyone googles it.
suppressPackageStartupMessages(library(fgeo))
# Fails
tt_test(fgeo.x::tree5, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error: Can't deal with a zero value of `Tortotstcnthab` (total number of stems for
#> focal species per habitat of the focal torus-based map).
#> For more information see https://github.com/forestgeo/fgeo.analyze/issues/40.
I don't think users will know what to do about it but at least they can follow the link and find your discussion here. Let me konw if you think of a better message I could use verbatim.
--
@DanielZuleta, your previous last comment relates to the issue #90 (i.e. the same dataset passes when tt_test()
is fed with a habitat dataset with 8 habitats but fails with a habitats dataset with 16 habitats.).
Sorry, I haven´t seen the previous message from Sabrina.
@maurolepore I think the error message can be difficult to understand for common users, even if they go to the issue. So, I'd add something like this:
"One or more quadrats in the dataset with no trees"
Can't deal with a zero value of `Torspstcnthab` (total number of stems for focal species per habitat of the focal torus-based map). For more information see https://github.com/forestgeo/fgeo.analyze/issues/40
Great, I simplified it based on your suggestion. Both of you agree in that this problem shouldn't happen if every quadrat has at least one tree, and that's an easy to understand and actionable piece of information. I removed the low-level details which not actionable and confusing.
The change is now included in the latest version on GitHub, and will be included in the next release on or CRAN-like repository.
Thanks!
suppressPackageStartupMessages(library(fgeo))
# Fails
tt_test(fgeo.x::tree5, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error: Every quadrat in the dataset must have at lease one tree.
#> For more information see https://github.com/forestgeo/fgeo.analyze/issues/40.
Created on 2019-03-25 by the reprex package (v0.2.1)