ropensci/stats19

Minor issue with get_stats19 and multiple years

agila5 opened this issue · 3 comments

Hi! I don't know precisely all the internals of stats19 and details of STATS19 data but I simply wanted to point out a minor issue with get_stats19 and multiple years.

It seems that if the selected year is between 2005 and 2014, then the R function downloads all car crashes data from 2005 to 2014:

> library(stats19)
Data provided under OGL v3.0. Cite the source and link to:
www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
> get_stats19(2005)
No files of that type found for that year.
Files identified: Stats19_Data_2005-2014.zip

http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip
Attempt downloading from: 
trying URL 'http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip'
Content type 'application/x-zip-compressed' length 108358586 bytes (103.3 MB)

The problem is that if I ask for all car crashes between 2005 and 2010 (i.e. get_stats19(2005:2010)), then the same data are being read for 6 times. Obviously this is not a real problem and the solution is extremely easy (just ask for 2005), but maybe it's worth creating a warning message in these cases and change the input years to only 2005. What do you think?

That is an issue. It's because there is a single file for all those years. Could a solution be to pre-check the years and if there are multiple years within that range remove all but one of them (the most recent)?

Could a solution be to pre-check the years and if there are multiple years within that range remove all but one of them (the most recent)?

IMO yes. I can work on a PR in the next days.

If you're willing and able that would be amazing. Thanks for reporting and (if you find time and motivation) potentially fixing this pesky edge case issue!