langcog/tidyboot

upper_ci of README example

Closed this issue · 1 comments

It seems strange that the upper_ci and mean is the same for your readme example:

library(dplyr)
library(tidyboot)

gauss1 <- data_frame(value = rnorm(500, mean = 0, sd = 1), condition = 1)
gauss2 <- data_frame(value = rnorm(500, mean = 2, sd = 3), condition = 2)
df <- bind_rows(gauss1, gauss2)

df %>%
  group_by(condition) %>%
  tidyboot_mean(column = value)
#> # A tibble: 2 x 6
#>   condition     n empirical_mean    ci_lower       mean   ci_upper
#>       <dbl> <int>          <dbl>       <dbl>      <dbl>      <dbl>
#> 1         1   500      0.0415903 -0.04795741 0.04006775 0.04006775
#> 2         2   500      2.0494461  1.80454457 2.05265278 2.05265278

I get the same "bug" when I run it on my dataset. Is there something wrong with the statistics_functions? When I changed the order from ci_lower, mean, ci_upper to ci_upper, mean, ci_lower then the ci_lower is the same as mean.

Thanks.

Yes, I reproduce this, it's because @langcog has hard-coded "mean" as a column name in tidyboot_mean(). The same issue can be replicated from the documentation for tidyboot.data.frame if you try to run tidyboot the long way:

gauss1 <- data_frame(value = rnorm(30, mean = 0, sd = 1), site = 1, spp = 1)
gauss2 <- data_frame(value = rnorm(20, mean = 2, sd = 1), site = 1, spp = 2)
gauss3 <- data_frame(value = rnorm(50, mean = 1, sd = 1), site = 2, spp = 1)
gauss4 <- data_frame(value = rnorm(7, mean = 3, sd = 1), site = 2, spp = 2)
df     <- bind_rows(gauss1, gauss2, gauss3, gauss4)

# As provided in the documentation for tidyboot.data.frame, but with one added group.

df %>% group_by(site, spp) %>%
    tidyboot(summary_function = function(x) x %>% summarise(mean = mean(value)),
             statistics_functions = function(x) x %>%
                 summarise_at(vars(mean), funs(ci_upper, mean, ci_lower)))

## site   spp     n empirical_mean ci_upper   mean ci_lower
## <dbl> <dbl> <int>          <dbl>    <dbl>  <dbl>    <dbl>
##     1     1    30         -0.288   0.0931 -0.282   -0.282
##     1     2    20          2.33    2.79    2.32     2.32 
##     2     1    50          0.886   1.13    0.888    0.888
##     2     2     7          3.06    4.27    3.08     3.08 

And it gets worse if you compute mean first, because it will then be used as the new mean column for the functions that run after it:

df %>% group_by(site, spp) %>%
    tidyboot(summary_function = function(x) x %>% summarise(mean = mean(value)),
             statistics_functions = function(x) x %>%
                 summarise_at(vars(mean), funs(mean, ci_upper, ci_lower)))

## site   spp     n empirical_mean   mean ci_upper ci_lower
## <dbl> <dbl> <int>          <dbl>  <dbl>    <dbl>    <dbl>
##     1     1    30         -0.288 -0.296   -0.296   -0.296
##     1     2    20          2.33   2.34     2.34     2.34 
##     2     1    50          0.886  0.879    0.879    0.879
##     2     2     7          3.06   3.10     3.10     3.10 

If you rename the raw data column then it works out fine.

df %>% group_by(site, spp) %>%
    tidyboot(summary_function = function(x) x %>% summarise(my_mean = mean(value)),
             statistics_functions = function(x) x %>%
                 summarise_at(vars(my_mean), funs(ci_upper, mean, ci_lower)))

## site   spp     n empirical_my_mean   mean ci_upper ci_lower
## <dbl> <dbl> <int>             <dbl>  <dbl>    <dbl>    <dbl>
##     1     1    30            -0.288 -0.288   0.0589   -0.645
##     1     2    20             2.33   2.34    2.83      1.85 
##     2     1    50             0.886  0.882   1.12      0.626
##     2     2     7             3.06   3.09    4.27      1.82