forestgeo/fgeo.analyze

Every quadrat in the dataset must have at lease one tree

maurolepore opened this issue · 15 comments

@srusso2 and @DanielZuleta,

Small tree tables used to err because

  Torspstcnthab / Tortotstcnthab = NaN

(edited, thanks @srusso2 for noticing the mistake via #40 (comment))

Now they no longer err because I'm now converting NaN to 0 with a warning. Is this OK?

(I know that users "should only try to determine the habitat association for sufficiently abundant species" but we lack a formal definition of sufficiently abundant that I can use to programatically reject bad inputs.)

The warning is the one right below, and the code that implements this warning is here and here (see details in the example below):

#> Warning:   Using zero (`0`) where the relative stem density of focal species
#>   per habitat of the focal torus-based map can't be calculated
#>   because `Tortotstcnthab` and `Torspstcnthab` are zero:
#>   * `Tortotstcnthab` determines total number of stems per habitat of the
#>     focal torus-based map.
#>   * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#>     the focal torus-based map.

With large tree table throws everything is OK

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(fgeo))
tree <- fgeo.x::tree6_3species
autoplot(sp(tree))

tt_test(tree, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> [[1]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> CASARB      29     1242      356        2              0        0.77625
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> CASARB      20      390     1206        4              0        0.24375
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> CASARB      12      778      817        5              0        0.48625
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> CASARB       5      932      658       10              0         0.5825
#> attr(,"fixed_nan")
#> [1] FALSE
#> 
#> [[2]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> PREMON      91     1093      504        3              0       0.683125
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> PREMON      89     1254      344        2              0        0.78375
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> PREMON      40      305     1292        3              0       0.190625
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> PREMON      14      270     1322        8              0        0.16875
#> attr(,"fixed_nan")
#> [1] FALSE
#> 
#> [[3]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> SLOBER      18      273     1324        3              0       0.170625
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> SLOBER      24      810      788        2              0        0.50625
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> SLOBER      17     1155      440        5              0       0.721875
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> SLOBER       7     1292      303        5              0         0.8075
#> attr(,"fixed_nan")
#> [1] FALSE

With small tree table (throws warning)

# Used to err because `Tortotstcnthab / Torspstcnthab = NaN` 
smaller_tree <- sample_n(tree, 50)
autoplot(sp(smaller_tree))

# Now no longer errs because I'm now converting `NaN` to `0` with a warning. 
# Is this OK?
tt_test(smaller_tree, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Warning:   Using zero (`0`) where the relative stem density of focal species
#>   per habitat of the focal torus-based map can't be calculated
#>   because `Tortotstcnthab` and `Torspstcnthab` are zero:
#>   * `Tortotstcnthab` determines total number of stems per habitat of the
#>     focal torus-based map.
#>   * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#>     the focal torus-based map.
#> [[1]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> SLOBER       4     1021      544       35              0       0.638125
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> SLOBER       4     1120      408       72              0            0.7
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> SLOBER       0        0     1054      546             -1              0
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> SLOBER       0        0      676      924             -1              0
#> attr(,"fixed_nan")
#> [1] TRUE
#> 
#> [[2]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> PREMON       9      388     1179       33              0         0.2425
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> PREMON      10      487     1072       41              0       0.304375
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> PREMON       6     1318      236       46              0        0.82375
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> PREMON       4     1117        0      483              0       0.698125
#> attr(,"fixed_nan")
#> [1] TRUE
#> 
#> [[3]]
#>        N.Hab.1 Gr.Hab.1 Ls.Hab.1 Eq.Hab.1 Rep.Agg.Neut.1 Obs.Quantile.1
#> CASARB       3     1154      394       52              0        0.72125
#>        N.Hab.2 Gr.Hab.2 Ls.Hab.2 Eq.Hab.2 Rep.Agg.Neut.2 Obs.Quantile.2
#> CASARB       2      654      878       68              0        0.40875
#>        N.Hab.3 Gr.Hab.3 Ls.Hab.3 Eq.Hab.3 Rep.Agg.Neut.3 Obs.Quantile.3
#> CASARB       1      861      664       75              0       0.538125
#>        N.Hab.4 Gr.Hab.4 Ls.Hab.4 Eq.Hab.4 Rep.Agg.Neut.4 Obs.Quantile.4
#> CASARB       0        0      683      917             -1              0
#> attr(,"fixed_nan")
#> [1] TRUE

Created on 2018-12-31 by the reprex package (v0.2.1)

Hi @maurolepore ,
(cc @srusso2)

Sorry for my late response.
I don't think that's the best solution to this issue because when you convert these NaN to 0, the values for the GrLsEq object are affected. The GrLsEq object contains the "greater than", "less than", and "equal to" counts employed to test if the species is repelled, aggregated or neutrally distributed on each habitat. In other words, despite the warning, the main output of the function is affected, which has serious implications for the inferences on the species' habitat association. (Mainly in cases in which users do not pay attention to warnings.)

Evidently, in your example, the main conclusions about the habitat associations (Rep.Agg.Neut.n columns) changed when you used the small tree table for the two species that significantly reduced their abundance after using the sample_n(tree, 50) code (i.e., SLOBER and CASARB).

The root of the problem is, as you stated, the definition of sufficiently abundant species and that's why we recommend employing different sample size thresholds (i.e., species’ abundance) when inferring about the community-wide responses to different types of environmental variation (environmental habitats) (see Zuleta et al. 2018).

I think we can to get rid of these err by converting NaN to 0 and changing the main results in the Rep.Agg.Neut.n columns to NA plus a clear warning message saying that the abundance of that species is too low to calculate its relative tree density per habitat of the torus-based map.

Thanks @DanielZuleta! Great suggestion!

Thanks @srusso2 for noticing that my message at the top was incorrect. I clarify that my mistake is in my message, not in the code -- see below:

image

I get that a quadrat shouldn't be empty but the example bleow shows that it can. (e.g. What if there is a fire and all trees die?) Handling small, artificial datasets is also useful for examples and tests, given that the funciton takes quite a bit of time to run.

Likely, whatever can happen will eventually happen so I need to figure out a way to handle that possiblity -- even if "to handle it" means to stop and throw an informative error.

I need something actionable. What do you think it's best to do in this situation?

  1. Stop, return nothing but an error message (Which?).
  2. Proceed (@DanielZuleta's suggestion),
    • return NA wherever the result is meaningless
    • return as many reliable results as possible.
    • return a warning.
  3. Something else?
# Used to err because `Tortotstcnthab / Torspstcnthab = NaN` 
library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(fgeo)
#> -- Attaching packages ---------------------------------------------------------------- fgeo 1.1.2.9000 --
#> v fgeo.analyze 1.1.4          v fgeo.tool    1.2.2     
#> v fgeo.plot    1.1.3.9000     v fgeo.x       1.1.2
#> -- Conflicts ------------------------------------------------------------------------ fgeo_conflicts() --
#> x fgeo.tool::filter() masks dplyr::filter(), stats::filter()
#> x dplyr::intersect()  masks base::intersect()
#> x dplyr::lag()        masks stats::lag()
#> x dplyr::setdiff()    masks base::setdiff()
#> x dplyr::setequal()   masks base::setequal()
#> x dplyr::union()      masks base::union()
set.seed(1)

small_dataset <- fgeo.x::tree6_3species %>% 
  sample_n(50)

autoplot(sp(small_dataset))

Created on 2019-03-22 by the reprex package (v0.2.1)

... I guess there’s a wrapper function in which the user can pick what minimum tree abundance they wish to use and that then subsets the data so that only the species meeting the minimum abundance criterion are run for the TT? If not, then there should be one.

Thanks @srusso2, this is a great idea and I would like to pursue it. But if I understand correctly, the problem of potentially empty quadrats remains. The abundance of a species may be above some threshold, and yet some quadrat may have no tree at all. Maybe we need another check_for_empty_quadrats() wrapper?

I got it!

The issue is not a given empty quadrat, but empty habitats (set of quadrats) that perfectly match with a habitat of one of the translated maps:

The root of the problem is when both the Tortotstcnthab[i]=0 and the Torspstcnthab[i]=0, which means that for the given translation (Tor) there are no trees in the focal habitat **[i]**. Therefore, the problem arises because there is a part of the plot with no trees that coincidentally, perfectly matches with the habitat **[i]** of the given translated map. That´s why the smaller the tree table i.e., the total number of trees in your database, the higher the probability to get Torspprophab= 0/0 =Nan.

As @srusso2 said, we can get rid of these NaN by making sure (from the beginning of the code) that every quadrat of the plot has at least one tree in the dataset. Otherwise, we would have to modify the shape of the plot and, thus, the way of doing the translations.

Awesome @DanielZuleta, good work!

I did some experiments truncating the census and habitat data and suspected that the problem was some kind of missmatch between the habitat and census data. I was unable to articulate it as eloquently as you did. Thanks!

So this is the state of the art:

tt_test() no longer replaces NaN with 0 and instead it now aborts. But because in the past few hours I wasn't convinced that the problem was only an empty quadrat, the error message I throw is more complex (see below). With this new insight I should be able to write a much more succint message. If you have a suggestion that I could use verbatim let me know.

This is what we have for now:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(fgeo)
#> -- Attaching packages --------------------------------------------- fgeo 1.1.2.9000 --
#> v fgeo.analyze 1.1.4.9000     v fgeo.tool    1.2.2     
#> v fgeo.plot    1.1.3.9000     v fgeo.x       1.1.2
#> -- Conflicts ----------------------------------------------------- fgeo_conflicts() --
#> x fgeo.tool::filter() masks dplyr::filter(), stats::filter()
#> x dplyr::intersect()  masks base::intersect()
#> x dplyr::lag()        masks stats::lag()
#> x dplyr::setdiff()    masks base::setdiff()
#> x dplyr::setequal()   masks base::setequal()
#> x dplyr::union()      masks base::union()

# Fails
insufficient_density <- fgeo.x::tree5
autoplot(sp(insufficient_density))

habitat <- fgeo.x::habitat

tt_test(insufficient_density, habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error:   Can't calculate the relative stem density of focal species per habitat of
#>   the focal torus-based map. Is your data sufficiently abundant?
#>   `Tortotstcnthab / Torspstcnthab` is not a number (`NaN`).
#>   * `Tortotstcnthab` determines total number of stems per habitat of the
#>   focal torus-based map.
#>   * `Torspstcnthab` determines tot. no. stems for focal sp. per habitat of
#>   the focal torus-based map.
#>   For more details see https://github.com/forestgeo/fgeo.analyze/issues/40.

Created on 2019-03-22 by the reprex package (v0.2.1)

Actually non-zero number/0 = Inf never is going to happen because the numerator is the number of trees of the focal species in that translated habitat. So, if the numerator is x=non-zero number, there has to be (mandatory) at least the same x=non-zero number in the denominator (the total number of trees of all the species in that habitat)

With what both of you just added, I'll tweak the code to err not when the division results in NaN but simply when the denominator is zero.

  • I refined the error message.
  • I changed the title of this issue to match the error message if anyone googles it.
suppressPackageStartupMessages(library(fgeo))

# Fails
tt_test(fgeo.x::tree5, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error:   Can't deal with a zero value of `Tortotstcnthab` (total number of stems for
#>   focal species per habitat of the focal torus-based map).
#>   For more information see https://github.com/forestgeo/fgeo.analyze/issues/40.

I don't think users will know what to do about it but at least they can follow the link and find your discussion here. Let me konw if you think of a better message I could use verbatim.

--

@DanielZuleta, your previous last comment relates to the issue #90 (i.e. the same dataset passes when tt_test() is fed with a habitat dataset with 8 habitats but fails with a habitats dataset with 16 habitats.).

Sorry, I haven´t seen the previous message from Sabrina.


@maurolepore I think the error message can be difficult to understand for common users, even if they go to the issue. So, I'd add something like this:

"One or more quadrats in the dataset with no trees"
Can't deal with a zero value of `Torspstcnthab` (total number of stems for focal species per habitat of the focal torus-based map). For more information see https://github.com/forestgeo/fgeo.analyze/issues/40

Great, I simplified it based on your suggestion. Both of you agree in that this problem shouldn't happen if every quadrat has at least one tree, and that's an easy to understand and actionable piece of information. I removed the low-level details which not actionable and confusing.

The change is now included in the latest version on GitHub, and will be included in the next release on or CRAN-like repository.

Thanks!

suppressPackageStartupMessages(library(fgeo))

# Fails
tt_test(fgeo.x::tree5, fgeo.x::habitat)
#> Using `plotdim = c(320, 500)`. To change this value see `?tt_test()`.
#> Using `gridsize = 20`. To change this value see `?tt_test()`.
#> Error:   Every quadrat in the dataset must have at lease one tree.
#>   For more information see https://github.com/forestgeo/fgeo.analyze/issues/40.

Created on 2019-03-25 by the reprex package (v0.2.1)