ropensci/stats19

get_stats19 is inconsistent

layik opened this issue · 2 comments

layik commented
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

year = 2019
check_dd <- function(year) {
  message("checking... ", year)
  dd = "."
  d = get_stats19(year, data_dir = dd)
  return(nrow(d) > 0)
}

check_dd(2019)
#> checking... 2019
#> Files identified: DfTRoadSafety_Accidents_2019.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Accidents_2019.zip
#> Attempt downloading from:
#> Data saved at ./DfTRoadSafety_Accidents_2019/Road Safety Data - Accidents 2019.csv
#> DfTRoadSafety_Accidents_2019
#> Error: Change data_dir, filename, year or run dl_stats19() first.
check_dd(2018)
#> checking... 2018
#> Files identified: dftRoadSafetyData_Accidents_2018.csv
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2018.csv
#> Attempt downloading from:
#> Data saved at ./dftRoadSafetyData_Accidents_2018.csv
#> Reading in:
#> ./dftRoadSafetyData_Accidents_2018.csv
#> date and time columns present, creating formatted datetime column
#> [1] TRUE
check_dd(2017)
#> checking... 2017
#> Files identified: dftRoadSafetyData_Accidents_2017.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip
#> Attempt downloading from:
#> Data saved at ./dftRoadSafetyData_Accidents_2017/Acc.csv
#> dftRoadSafetyData_Accidents_2017
#> Error: Change data_dir, filename, year or run dl_stats19() first.

packageVersion("stats19")
#> [1] '1.4.0'

Created on 2021-03-18 by the reprex package (v0.3.0)

Hi @layik! I think that the problem is related to the fact that when the stats19 files are not saved in the tempdir(), then the following if-clause fails:

stats19/R/utils.R

Lines 184 to 186 in 0abb7d7

if(length(path) == 1 && file.exists(path)) {
return(path)
}

For example:

library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

# works
get_stats19(2017, data_dir = tempdir(), silent = TRUE)
#> date and time columns present, creating formatted datetime column
#> # A tibble: 129,982 x 33
#>    accident_index location_easting_osgr location_northing_os~ longitude latitude
#>    <chr>                          <int>                 <int>     <dbl>    <dbl>
#>  1 2017010001708                 532920                196330   -0.0801     51.7
#>  2 2017010009342                 526790                181970   -0.174      51.5
#>  3 2017010009344                 535200                181260   -0.0530     51.5
#>  4 2017010009348                 534340                193560   -0.0607     51.6
#>  5 2017010009350                 533680                187820   -0.0724     51.6
#>  6 2017010009351                 514510                172370   -0.354      51.4
#>  7 2017010009353                 508640                181870   -0.435      51.5
#>  8 2017010009354                 527880                181950   -0.158      51.5
#>  9 2017010009357                 520940                192820   -0.254      51.6
#> 10 2017010009358                 531430                178450   -0.108      51.5
#> # ... with 129,972 more rows, and 28 more variables: police_force <chr>,
#> #   accident_severity <chr>, number_of_vehicles <int>,
#> #   number_of_casualties <int>, date <date>, day_of_week <chr>, time <chr>,
#> #   local_authority_district <chr>, local_authority_highway <chr>,
#> #   first_road_class <chr>, first_road_number <int>, road_type <chr>,
#> #   speed_limit <int>, junction_detail <chr>, junction_control <chr>,
#> #   second_road_class <chr>, second_road_number <int>,
#> #   pedestrian_crossing_human_control <chr>,
#> #   pedestrian_crossing_physical_facilities <chr>, light_conditions <chr>,
#> #   weather_conditions <chr>, road_surface_conditions <chr>,
#> #   special_conditions_at_site <chr>, carriageway_hazards <chr>,
#> #   urban_or_rural_area <chr>,
#> #   did_police_officer_attend_scene_of_accident <int>,
#> #   lsoa_of_accident_location <chr>, datetime <dttm>

# fails
get_stats19(2017, data_dir = ".", silent = TRUE)
#> dftRoadSafetyData_Accidents_2017
#> Error: Change data_dir, filename, year or run dl_stats19() first.

# fails
get_stats19(2017, data_dir = tempdir(), silent = TRUE)
#> dftRoadSafetyData_Accidents_2017
#> Error: Change data_dir, filename, year or run dl_stats19() first.

Created on 2021-03-23 by the reprex package (v1.0.0)

If you don't have a better solution, I think we should add a further test in the if clause that checks if path points to a .csv file.

layik commented

Well done @agila5. No I have had no time to offer any solution and I see that you have identified it. So if all tests pass then great! Thanks Andrea.