ropensci/stats19

Test `lazy = FALSE`

agila5 opened this issue · 2 comments

See #205 and the discussion in tidyverse/readr#1266

When testing the existing approach I get the following:

# current approach
remotes::install_github("ropensci/stats19", "master", upgrade = "never", quiet = TRUE)
options(width = 120)

microbenchmark::microbenchmark(
  one_year = suppressMessages(stats19::get_stats19(2019, silent = TRUE)), 
  one_year_filter = {
    suppressMessages(crashes <- stats19::get_stats19(2019, silent = TRUE))
    crashes_london <- crashes[crashes$police_force == "City of London", ]
  },
  multiple_years = suppressMessages(stats19::get_stats19(2015:2019, silent = TRUE)),
  multiple_years_filter = {
    suppressMessages(crashes <- stats19::get_stats19(2015:2019, silent = TRUE))
    crashes_london <- crashes[crashes$police_force == "City of London", ]
  }, 
  times = 5L
)
#> Unit: milliseconds
#>                   expr      min        lq      mean    median        uq       max neval cld
#>               one_year  728.484  758.3625  857.1061  886.6820  931.4195  980.5827     5  a 
#>        one_year_filter  691.783  710.3164  742.4217  728.0939  779.0534  802.8620     5  a 
#>         multiple_years 5721.792 5732.5708 5912.5875 5925.7921 6090.2348 6092.5478     5   b
#>  multiple_years_filter 5806.420 6018.1828 6389.2668 6188.9528 6598.0124 7334.7656     5   b

Created on 2021-08-24 by the reprex package (v2.0.0)

while with just lazy = FALSE in all scenarios I get the following:

# lazy = FALSE approach
remotes::install_github("ropensci/stats19", "test-lazy", upgrade = "never", quiet = TRUE)
options(width = 120)

microbenchmark::microbenchmark(
  one_year = suppressMessages(stats19::get_stats19(2019, silent = TRUE)), 
  one_year_filter = {
    suppressMessages(crashes <- stats19::get_stats19(2019, silent = TRUE))
    crashes_london <- crashes[crashes$police_force == "City of London", ]
  },
  multiple_years = suppressMessages(stats19::get_stats19(2015:2019, silent = TRUE)),
  multiple_years_filter = {
    suppressMessages(crashes <- stats19::get_stats19(2015:2019, silent = TRUE))
    crashes_london <- crashes[crashes$police_force == "City of London", ]
  }, 
  times = 5L
)
#> Unit: milliseconds
#>                   expr       min        lq      mean    median        uq       max neval cld
#>               one_year  702.4651  726.7535  751.8321  730.6509  732.2352  867.0559     5  a 
#>        one_year_filter  673.2410  684.2855  778.9064  733.4660  855.8780  947.6614     5  a 
#>         multiple_years 4476.7977 4489.1523 4614.8126 4517.1048 4519.3307 5071.6776     5   b
#>  multiple_years_filter 4746.5225 4991.9151 5301.0332 5073.7521 5099.0314 6593.9451     5   b

Created on 2021-08-24 by the reprex package (v2.0.0)

I think we should test it a little bit more (the previous tests were run on Ubuntu 18.04 VM) and then maybe just adopt lazy = FALSE.

The results on my windows laptop are more or less identical