GPU accelerated road network routing between postcodes and health related POIs
Cillian
Berragan
[@cjberragan
]1* Mark
Green
[@markalangreen
]1
Alex
Singleton
[@alexsingleton
]1
1 Geographic Data Science Lab, University of Liverpool,
Liverpool, United Kingdom
* Correspondence: C.Berragan@liverpool.ac.uk
This project identifies the time-weighted distance required to travel by road between every postcode in Great Britain and a selection of health related points of interest. A ranked combination of these drive-times is used to create the AHAH index.
Access is defined through the average time-weighted road network distance for each postcode within each LSOA to the nearest point of interest of a particular type. For this, the road highways network and road speed estimates provided through Ordnance Survey was used, alongside the OWNS Postcode Directory for May 2020, which gives centroids for every postcode in the country.
This is a computationally intense calculation, with the total road network used having 3,816,897 edges, and 3,215,522 nodes. Access to each nearest health related POI was calculated using the Single Source Shortest Path algorithm, for all 1,659,451 postcodes in Great Britain.
This calculation was made possible through the GPU accelerated Python library cugraph
, part of the NVIDIA RAPIDS ecosystem, allowing the computation to be highly parallel, taking minutes, rather than days.
ahah
├── aggregate_lsoa.py # aggregate outputs to LSOA level
├── create_index.py # use aggregates to create index
├── get_nhs.py # retrieve NHS data
├── os_highways.py # process OS open roads data
├── process_air.py # process air quality data
├── process_routing.py # process all POI data
├── routing.py # main routing class
└── common
├── logger.py # use rich logging
└── utils.py # utility functions
Accessibility measures were created using the cugraph
GPU accelerated Python library for parallel processing of graph networks, in conjunction with the OS Open Road network. Unlike Routino, which uses Open Street Map data, the OS Open Road Network provides more accurate road speed estimates for UK roads.
In this study, we measured the network distance (travel time) between the centroid of each active postcode in Great Britain to the coordinates of each unique health asset (e.g. GP practice). Measured network distances for each indicator for postcodes were aggregated to the LSOA level, providing average network distance for each indicator (as a measure of accessibility). All other indicators were also summarised for LSOAs. The indicators within each domain were standardised by ranking and transformed to the standard normal distribution. The direction of each variable was dictated by the literature (e.g. accessibility to fast food outlets were identified as health negating, wheras accessibility to GP practices was health promoting).
To calculate our overall index (and domain specific values), we followed the methodology of the 2015 IMD. For each domain, we ranked each domain
where ‘ln’ denotes natural logarithm and ‘exp’ the exponential transformation.
The main domains across our indicators: retail services, health services, physical environment and air quality then were combined to form an overall index of‘Access to Healthy Assets and Hazards’ (AHAH)
- Speed estimates given to each road, based on
formOfway
androadClassification
- Time-weighted distance calculated using length of edge and speed estimate
- Node ID converted to sequential integers and saved with edges as parquet files
This stage prepares the
nodes
,postcodes
, andpoi
data for use in RAPIDScugraph
. Makes use of utility functions to assist with data preparation from the raw data sources.
- Clean raw data
- Find the nearest road node to each postcode and point of interest using GPU accelerated K Means Clustering
- Determine minimum buffer distance to use for each point of interest
- Distances returned for nearest 10 points of interest to each postcode using K Means
- For each unique POI the maximum distance to associated postcodes is taken and saved as a buffer for this POI
- Each POI is assigned the postcodes that fall within their KNN, used to determine buffer suitability when converted to a graph
- All processed data written to respective files
The routing stage of this project primarily makes use of the RAPIDS
cugraph
library. This stage iterates sequentially over each POI of a certain type and finds routes to every postcode within a certain buffer.
- Iterate over POI of a certain type
- Create
cuspatial.Graph()
with subset of road nodes usingcuspatial.points_in_spatial_window
with buffer - Run single-source shortest path from POI to each node in the sub
graph
cugraph.sssp
takes into accountweights
, which in this case are thetime-weighted
distance of each connection between nodes as reported by OSM.
SSSP
distances subset to return only nodes associated with postcodes, these distances are added iteratively to a complete dataframe of postcodes of which the smallest value for each postcode is taken
- Create raster of interpolated values from monitoring station points
- Exclude points that are MISSING
- Aggregate to LSOA by taking mean values
- Combine both processed secure and open data
- Intermediate variables calculated
- All variables ranked
- Exponential default calculated for all ranked variables
- Percentiles calculated from ranked variables
- Domains Scores calculated
- Domain scores calculated from mean of each domains input variables
- Domain scores ranked
- Domain percentiles calculated
- Exponential transformation calculated for each domain
- AHAH index calculated from mean of domain exponential transformations
- Ranked AHAH index calculated
- AHAH percentiles calculated
See the AHAH V2 FigShare Repository for the previous iteration.
-
Greenspace (NDVI Classification)