ropensci/stats19

Use casualty adjustment factors as standard

joeytalbot opened this issue Β· 16 comments

It's important to use the casualty adjustment factors, not the accident adjustment factors.

Look at the difference between these results. This is for an accident with 11 casualties, with differing severity levels. I have used first the accident adjustment, then the casualty adjustment, both downloaded using the new stats19::get_stats19_adjustments() function:

> adjust %>% filter(accident_index == "200901BS70002")
# A tibble: 1 x 4
  accident_index adjusted_serious adjusted_slight injury_based
  <chr>                     <dbl>           <dbl>        <dbl>
1 200901BS70002                 1               0            0

> cas_func %>% filter(accident_index == "200901BS70002")
# A tibble: 11 x 6
   accident_index Vehicle_Reference Casualty_Reference Adjusted_Serious Adjusted_Slight Injury_Based
   <chr>                      <dbl>              <dbl>            <dbl>           <dbl>        <dbl>
 1 200901BS70002                  2                  2           0.0445           0.956            0
 2 200901BS70002                  2                  7           0.0590           0.941            0
 3 200901BS70002                  2                  8           0.0291           0.971            0
 4 200901BS70002                  2                 10           0.0310           0.969            0
 5 200901BS70002                  2                  9           0.0625           0.938            0
 6 200901BS70002                  2                  5           0.0289           0.971            0
 7 200901BS70002                  1                  1           0.0499           0.950            0
 8 200901BS70002                  2                  3           1                0                0
 9 200901BS70002                  2                  6           0.0237           0.976            0
10 200901BS70002                  2                 11           0.0534           0.947            0
11 200901BS70002                  2                  4           1                0                0

The accident adjustment simply records an accident as serious if there is at least one serious casualty involved in the accident.

Great point @joeytalbot. Are you up for having a go at editing adjustments.R after #175 is merged so it gets the casualty level adjustment factors by default? Thanks for raising this.

Yes I can have a go, when will it be merged?

When the tests pass. Within the hour hopefully!

Also, it looks like the adjustment factors have been altered slightly since I originally downloaded them. The new ones are slightly different from the ones I used to create casualties_adjusted.Rds

Worth updating casualties_adjusted.Rds and deleting/renaming the old one?

Yes I'd better do that.

Heads-up @mem48 before you get too far into the analysis, an updated casualties_adjusted.Rds file is incoming. Watch this space on when it gets updated... Will be good to use the latest version of that file, great work @joeytalbot for spotting the update!

Heads-up @joeytalbot the updated code is now on master after fixing a few more issues to ensure the CI passed. See #175 for details, will be great for it to give casualty level adjustments by default. Also related to #176.

@joeytalbot great job! I was going to look at this once I had got my internet working but you beat me to it!

@joeytalbot Can you let me know when you have a chance to create the casualty adjustment code so I can do the vignette? Thanks

@PublicHealthDataGeek I'm sorry for the delay on this! The new version of casualties_adjusted.Rds is already available on saferactive/releases/tag/v0.1. I'm working now on updating stats19/adjustments.R so it uses casualty adjustments as standard.

I've amended adjustments.R and it seems to be getting the casualty adjustments properly now.

I will put these changes in a pull request.

library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
new_adj = get_stats19_adjustments()
#> Unzipped files from DfT can be found in the folder:
#> /tmp/RtmpotDT6A
#> accident-and-casualty-adjustment-2004-to-2019.zip
#> adjustment-data
#> file177c5737c414.so
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   accident_index = col_character(),
#>   Vehicle_Reference = col_double(),
#>   Casualty_Reference = col_double(),
#>   Adjusted_Serious = col_double(),
#>   Adjusted_Slight = col_double(),
#>   Injury_Based = col_double()
#> )
new_adj  %>% filter(accident_index == "200901BS70002")
#> # A tibble: 11 x 6
#>    accident_index Vehicle_Referen… Casualty_Refere… Adjusted_Serious
#>    <chr>                     <dbl>            <dbl>            <dbl>
#>  1 200901BS70002                 2                2           0.0445
#>  2 200901BS70002                 2                7           0.0590
#>  3 200901BS70002                 2                8           0.0291
#>  4 200901BS70002                 2               10           0.0310
#>  5 200901BS70002                 2                9           0.0625
#>  6 200901BS70002                 2                5           0.0289
#>  7 200901BS70002                 1                1           0.0499
#>  8 200901BS70002                 2                3           1     
#>  9 200901BS70002                 2                6           0.0237
#> 10 200901BS70002                 2               11           0.0534
#> 11 200901BS70002                 2                4           1     
#> # … with 2 more variables: Adjusted_Slight <dbl>, Injury_Based <dbl>

Created on 2021-01-07 by the reprex package (v0.3.0)

I don't have permission to push changes to ropensci/stats19, although I'm on a branch.

thanks @joeytalbot.

I thought that I would try to reproduce the adjusted figures that the DfT published in the statistical release for the vignette so that would validate this work too @Robinlovelace

Great idea @PublicHealthDataGeek, will be a good exercise in reproducibility.