ropensci/stats19

Issues with historic data

Robinlovelace opened this issue · 1 comments

Shown below.

remotes::install_cran("stats19")
#> Skipping install of 'stats19' from a cran remote, the SHA1 (3.0.0) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
collisions = get_stats19("collision", year = "1979")
#> Files identified: dft-road-casualty-statistics-collision-1979-latest-published-year.csv
#>    https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-1979-latest-published-year.csv
#> Data already exists in data_dir, not downloading
#> Data saved at ~/data/stats19/dft-road-casualty-statistics-collision-1979-latest-published-year.csv
#> Reading in:
#> ~/data/stats19/dft-road-casualty-statistics-collision-1979-latest-published-year.csv
#> Rows: 5055631 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (10): accident_index, accident_reference, location_easting_osgr, locati...
#> dbl  (25): accident_year, police_force, accident_severity, number_of_vehicle...
#> time  (1): time
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
dim(collisions)
#> [1] 5055631      37
table(collisions$accident_year)
#> 
#>   1979   1980   1981   1982   1983   1984   1985   1986   1987   1988   1989 
#> 254967 250958 248276 256007 242876 253183 245645 247878 239063 246994 260759 
#>   1990   1991   1992   1993   1994   1995   1996   1997   1998   1999 
#> 258441 235889 233104 228975 234254 230544 236193 240287 238923 172415
summary(collisions$date)
#>         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
#> "1979-01-01" "1984-01-18" "1989-03-03" "1989-03-28" "1994-05-26" "1999-12-31" 
#>         NA's 
#>          "8"
collisions |> 
  select(date) |> 
  sample_n(12)
#> # A tibble: 12 × 1
#>    date      
#>    <date>    
#>  1 1992-06-27
#>  2 1994-07-07
#>  3 1999-06-20
#>  4 1979-12-05
#>  5 1991-04-05
#>  6 1979-10-17
#>  7 1987-12-04
#>  8 1999-04-03
#>  9 1980-04-10
#> 10 1997-01-23
#> 11 1989-02-12
#> 12 1980-06-02

Created on 2023-10-18 with reprex v2.0.2

This is an issue with the data not the package but will leave open in case of use/interest to others.