ropensci/stats19

Spatial resolution of STATS19 geographic data

agila5 opened this issue · 12 comments

Hi everyone, quick question. The README says

stats19/README.Rmd

Lines 180 to 181 in 535673a

An important feature of STATS19 data is that the "accidents" table contains geographic coordinates.
These are provided at ~10m resolution in the UK's official coordinate reference system (the Ordnance Survey National Grid, EPSG code 27700).

Do you remember where (i.e. a DfT document or something similar) it's written that STATS19 data have ~10m resolution?

Hi @agila5 good question and sorry for the slow response. Yes, this information was provided by @WillSERP. I've never seen a quantitative assessment of the accuracy of STATS19 data and know that it has improved over time. That couuld be a worthwhile thing to do at some point. In the meantime, do you @wengraf have any comments on this? Cheers!

If I stated 10m resolution it was a mistake, it should be 1m resolution. I don't know if this is stated in a DfT document (it does not appear in STATS20), but it is stated here;
You can test for yourself using ; Use the polyline tool to draw a horizontal line of a given length, and then right click on each end and it will show you the eastings and northings

????? I inserted links but they seem to have disappeared! Here is the url text:
http://doc.ukdataservice.ac.uk/doc/6340/mrdoc/pdf/6340a_userguide_ngr.pdf
https://gridreferencefinder.com/

Hi @WillSERP thanks for this but I meant the accuracy of the data not its precision: the coordinates could be reported to the nearest cm but that doesn't mean the crash actually happened there. Looking back at the comments from Andrea I'm not 100% sure he means measurement accuracy, which can be defined as the distance from the recorded location within which 95% of actual crashes happened (that's what I meant by the 10 m value). A statistician may have a better definition of accuracy, cc @agila5.

I think you're talking about measurement precision @WillSERP. I'm not actually sure how the latest location info is added, is it with a GPS? If so I think 10 m is still about right, accounting for the inherent ~5 m accuracy of the device plus another ~5 m uncertainty: was the person standing exactly where the crash happened?

According to this article

For example, GPS-enabled smartphones are typically accurate to within a 4.9 m (16 ft.) radius under open sky (view source at ION.org). However, their accuracy worsens near buildings, bridges, and trees.

They seem to define accuracy as:

For example, the government commits to broadcasting the GPS signal in space with a global average user range error (URE) of ≤7.8 m (25.6 ft.), with 95% probability. Actual performance exceeds the specification. On May 11, 2016, the global average URE was ≤0.715 m (2.3 ft.), 95% of the time.

In any case I'm sure there is room for improvement in how we talk about these things in the package documentation so this is a useful issue, many thanks!

In that case the ~10m accuracy is about right but only for police force areas where secondary data quality checks are carried out. This involves checking the grid refs on a road map to see if the plotted location matches the descriptive text and fields denoting junction and road types. When these checks are performed we can usually be confident the plotted location is within 10m of the true location.

I can't speak for areas where these checks are not performed, except that I would expect more outliers with low accuracy.

The use of handheld devices to input data is improving accuracy, but requires police forces to adopt the technology (not all have done so), and police officers to be trained to use it properly (i.e. to plot the location of the collision, not the McDonalds drive thru where they do the paperwork)

Hi @Robinlovelace , @agila5 and @WillSERP:

This came up a bit in the STATS19 review (out now! https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1001195/stats-19-review-final-report.pdf) and in some of the discussions I've had about all lane running. I take both points about accuracy and precision, but there is another question also, namely what that point (however precise/accurate it might be) really represents. The point of initial impact? Where the vehicle comes to rest? Some approximated mid point? Especially for high speed collisions, I don't think crashes happen in single points in space. Motorway ones are surely more accurately a polygon. So, at best, the location is written out to 1m (six fig grid ref), taken from either a consumer grade GPS, or some reckoning from a map by hand. It can only be enough to attempt to attribute admin responsibility (what LA, what road authority, what police force etc) and basic circumstances (T-junction, rural/urban etc). So, basically, what ever the accuracy/precision, it is all a bit false precision in a sense.

Excellent point as always @wengraf .
The current (from 2011) STATS20 guidance is somewaht vague, stating:
"An accident should usually be located where the first impact, at which an injury was sustained,
occurred, although there may be circumstances in which the LPA feel it more appropriate to locate
the accident at the point where a vehicle lost control. Where a vehicle impacts after having left the
carriageway, the accident should, normally, be located at the point at which the vehicle first left the
carriageway.
"

When amending inaccurate locations we go by first point of impact, or where there was a loss of control preceeding impact, the point at which the loss of control ocurred. This is with a focus on the needs of engineers to identify aspects of the highway that could affect the collision risk. For example, when I was 26 and foolish I put my car on its side on the nearside verge of a country lane about 20m after the exit of a corner. There was nothing wrong with this verge, but on the entrance to the corner there was a lack of kerbing that allowed soil to wash into the road. Combined with my poor driving and poor tyres this led to a loss of control around the apex. By all accounts this was a popular place to loose control and kerbing was installed on the verge to help idiots like me keep rubber side down.

I have no faith whatsoever that this guidance is followed anywhere close to consistently enough for this to be a help when using the data!

Excellent point as always @wengraf .
The current (from 2011) STATS20 guidance is somewaht vague, stating:
"An accident should usually be located where the first impact, at which an injury was sustained, occurred, although there may be circumstances in which the LPA feel it more appropriate to locate the accident at the point where a vehicle lost control. Where a vehicle impacts after having left the carriageway, the accident should, normally, be located at the point at which the vehicle first left the carriageway."

Location is what it is, and it is good enough for the vast majority of the location-based analyses anyone might want to do. That's all you can say, really.

The use of handheld devices to input data is improving accuracy, but requires police forces to adopt the technology (not all have done so), and police officers to be trained to use it properly (i.e. to plot the location of the collision, not the McDonalds drive thru where they do the paperwork)

😆

I think there is enough great info in this thread to act on: we should mention these points and add the links provided by Ivo and Will. I know it may be a bit late and overkill for your needs (for the academic paper, right @agila5 ) but we should incorporate this info somehow into the package documentation.

Hi everyone and good morning. First of all, thank you very much for your comments.

I think there is enough great info in this thread to act on: we should mention these points and add the links provided by Ivo and Will.

If you want, I can create a PR to add these points to the README sometime in August

I know it may be a bit late and overkill for your needs (for the academic paper, right @agila5 )

Yes but I think it doesn't matter 😅

If you want, I can create a PR to add these points to the README sometime in August

Yes please 🙏 I will assign you but if you get bogged down on other stuff don't worry just let us know and I can give it a go no problem. It's great to see how open source projects can effectively crowd source information. Thanks everyone!