ropensci/stats19

`get_stats19()` sometime fails when reading the same data twice

agila5 opened this issue · 8 comments

Not a big deal, but maybe worth exploring what's going on sooner or later:

library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

a17 <- get_stats19(2017, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a17 <- get_stats19(2017, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/dftRoadSafetyData_Accidents_2017/Acc.csv': Invalid argument

a18 <- get_stats19(2018, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a18 <- get_stats19(2018, silent = TRUE)
#> date and time columns present, creating formatted datetime column

a19 <- get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- get_stats19(2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

a1719 <- get_stats19(2017:2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/dftRoadSafetyData_Accidents_2017/Acc.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

Good catch! I think this may be responsible for some of the errors I'm seeing on CRAN submissions. Idea: do something like

if(file.exists(file_that_is_unzipped)) {
  # don't try to unzip the file
...
} else {
  # ...
}

Error I see on CRAN checks:

 Quitting from lines 291-295 (stats19.Rmd)
  Error: processing vignette 'stats19.Rmd' failed with diagnostics:
  cannot open file 'D:/temp/RtmpoBoYdg/working_dir/RtmpUNi5SU/dftRoadSafetyData_Casualties_2017/Cas.csv': Invalid argument
  --- failed re-building 'stats19.Rmd'

might be, I will check right now

I really don't understand what's going on 😅 Any help is greatly appreciated. The only finding I can add:

# packages
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

# works
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv

# fails
a2019 <- get_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Data saved at C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
#> Reading in:
#> C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
#> date and time columns present, creating formatted datetime column
a2019 <- get_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

# fails
dl_stats19(2019)
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Data already exists in data_dir, not downloading
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

I think it might be a bug in R + unzip + windows since (for whatever reason) the problem is fixed when I restart R but I cannot reproduce the problems outside of get_stats19. Will check again in a few days.

Nevermind, should be a problematic interaction with the recent upgrade in readr (i.e. the current CRAN version):

# current packages
remotes::install_github("ropensci/stats19")
#> Skipping install of 'stats19' from a github remote, the SHA1 (c1c8fde2) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_cran("readr", quiet = TRUE)

# test
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> Error in utils::unzip(destfile, exdir = file.path(data_dir, exdir)): cannot open file 'C:/Users/Utente/Documents/stats19-data/DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv': Invalid argument

Created on 2021-07-20 by the reprex package (v2.0.0)

while

# current packages
remotes::install_github("ropensci/stats19", quiet = TRUE)
remotes::install_version("readr", "1.4.0", quiet = TRUE)

# test
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column
a19 <- stats19::get_stats19(2019, silent = TRUE)
#> date and time columns present, creating formatted datetime column

Created on 2021-07-20 by the reprex package (v2.0.0)

Thanks for testing it Andrea, that would explain why it has only just appeared as an issue. Could you try running this line of code before the tests?

readr::local_edition(1) 

If that fixes it, we can, I guess, do

readr::with_edition(1, readr::read_csv("my_file.csv")) 

to solve the problem. Source: https://www.tidyverse.org/blog/2021/07/readr-2-0-0/#readr-2nd-edition

Or just add

readr::local_edition(1)

At the beginning of each function that uses readr functions: https://readr.tidyverse.org/reference/with_edition.html