/geo-scripts

Scripts for generating common geography files.

Primary LanguageJavaScript

Geography processing scripts

This repository contains a number of scripts and source files for processing geographic data files for Census maps, Build a custom area profile, and Area hub. The scripts also generate vector map tiles that are used across a number of other products.

The scripts generate the following output file types:

  • JSON files (per area) with geometry, metadata and parent/child relationships for individual areas.
  • Lookups from Census 2021 Output Areas to larger geographic areas.
  • CSV tables of codes and names for area search/select in the above products.
  • Vector tiles for various geography types.

System requirements

The scripts in this repo have dependencies that will only run on a Mac or Linux system (including Windows Subsystem for Linux). You will need to install the following dependencies in order to run the scripts:

In addition, the scripts use commands including curl, zip, gzip and rm, which should be available on any common Linux-based system.

Running the scripts

In order to run the scripts in this repo, you will first need to install the NodeJS depenencies:

npm install

You can then run each of the scripts below using the following command:

npm run {script-name}

It is important that the first two scripts are run in order before running the rest of the scripts, because they download and pre-process the necessary source files.

download-files

This script will download and transform a large number of boundary and lookup files from the ONS Open Geography Portal, as well as area codes from Nomis and MSOA Names from the House of Commons Library. The config can be found at /config/source-files.js.

get-names

This script generates a lookup file at /config/lookup/lookup_names.csv containing area codes and names extracted from the downloaded geographic boundary files, and merged from the MSOA names CSV file.

make-lookups

This script merges together the downloaded lookups, along with auxilliary lookups in the /config/lookup folder, to create the master lookup files needed to generate the geo files and vector tiles.

make-geos

This script generates the geography files required for Census maps (in the /output/cm-geos folder), and a common format shared by Build a custom area profile and the Area hub pages (in the /output/geos folder). It also generates name/code list CSVs for the latter two products in the /output folder. The config can be found at /config/geo-config.js.

An example geo file for Census maps can be found here. And example file for the other products can be found here.

Important! Make sure that the /output/geos folder is empty before running this script.

Note: The .json output files in the output folder are actually gzipped, so cannot be opened directly. If you want to inspect their contents, you need to add .gz to their filename, and then gunzip them.

make-lists

This script generates CSV lists of codes and names for places that can be found via search in Build a custom area profile and the Area hub. The metadata is extracted from the geo files generated by the make-geos script.

make-vtiles

This script generates vector tiles for the most commonly used smaller geography types, including local authorities, wards and statistical geographies. The output is in the form of .mbtiles files, written to the /output/vtiles folder. The config can be found at /config/vtiles-config.js. (Example output).

The vector tiles can be previewed by installing tileserver-gl-light and running the following command for the specific file you want to preview:

tileserver-gl-light ./output/vtiles/{file}.mbtiles

make-oa-vtiles-lookups & make-oa-vtiles

These two scripts are used in sequence to generate vector tiles for output areas suitable for both high and low zoom levels, specifically designed for use in Build a custom area profile. (Example output).

Whereas Tippecanoe is capable of merging together small areas to cater to lower zoom levels, these scripts explicitly merge the smallest OAs into their LSOA and MSOA parents, which is preferable both visually and in terms of the functionality of the above product.

Note: These tiles can also be used for standard map visualisations in the same way as the other vector tiles generated by the make-vtiles script.

zip-vtiles

This script unpacks the .mbtiles files in the ./output/vtiles into a directory structure using Tippecanoe's tile-join command, and then zips the directories into the same output folder.

These ZIP files are suitable for uploading and unzipping to serve from a static file service such as AWS S3.

Note: Serving from the .mbtiles files directly would require a vector tiles server.

Config files

All of the config files for these scripts can be found within the /config folder. This includes some lookup files in the /config/lookups folder which provide additional metadata which is either not available on the ONS Open Geography Portal (lookup_parent.csv), or where the data on the portal comes in an inconsistent format which cannot easily be processed in an automated way (lookup_area.ods);

Updating the config files

It should be possible to add additional geographies or years by modifying the config files, without the need to edit the script files. You will typically need to add multiple entries to /config/source-files.js (oa/parent lookups, names and boundaries) for each new geography/year, as well as updating the /config/geo-config.js and /config/vtiles-config.js files.

Editing the scripts

If you do need to edit the processing scripts, either to fix bugs or add features, you can find these in the /scripts folder. A number of shared functions are included in the /scripts/utils.js file.

Future additions

There are a number of features that could be added to these scripts in future. These include:

  • Generating the raster mask map tiles used in small area mapping products (such as Census maps).
  • Generating custom best-fit lookups using BFE boundaries and population-weighted centroids.
  • Calculating neighbours and related geographies based on boundary files.
  • Adding various additional metadata to the geo files, including neighbours and codes to allow linkages to other products and API data sources.