The goal of PgOSM Flex is to provide high quality OpenStreetMap datasets in PostGIS using the osm2pgsql Flex output. This project provides a curated set of Lua and SQL scripts to clean and organize the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs).
The approach to processing is to do as much processing in the <name>.lua
script
with post-processing steps creating indexes, constraints and comments in a companion <name>.sql
script.
A few decisions made in this project:
- ID column is
osm_id
- Geometry stored in SRID 3857
- Geometry column named
geom
- Default to same units as OpenStreetMap (e.g. km/hr, meters)
- Data not deemed worthy of a dedicated column goes in side table
osm.tags
. Raw key/value data stored inJSONB
column - Points, Lines, and Polygons are not mixed in a single table
Minimum versions supported:
- Postgres 12
- PostGIS 3.0
- osm2pgsql 1.5.0
The easiest option to use PgOSM-Flex is with the
Docker image.
The image has all the pre-reqs installed, handles downloading an OSM subregion
from Geofabrik, and saves an output .sql
file with the processed data
ready to load into your database(s).
The PBF/MD5 source files are archived by date with the ability to
easily reload them at a later date.
This section outlines the basic operations for using Docker to run PgOSM-Flex. See the full Docker instructions in docs/DOCKER-RUN.md.
Create directory for the .osm.pbf
file, output .sql
file, log output, and
the osm2pgsql command ran.
mkdir ~/pgosm-data
Set environment variables for the temporary Postgres connection in Docker.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
Start the pgosm
Docker container. At this point, PostgreSQL / PostGIS
is available on port 5433
.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Run the processing for the Washington D.C. The docker/pgosm_flex.py
script
requires three (3) parameters, typical use will use four (4) to include
the --subregion
.
- PgOSM-Flex layer set (
run-all
) - Total RAM for osm2pgsql, Postgres and OS (
8
) - Region (
north-america/us
) - Sub-region (
district-of-columbia
) (Optional)
docker exec -it \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-e POSTGRES_USER=$POSTGRES_USER \
pgosm python3 docker/pgosm_flex.py \
--layerset=run-all \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The initial output with the docker exec
command points to the log file
(linked in the docker run
command above), monitor this file to track
progress of the import.
Monitor /app/output/district-of-columbia.log for progress...
If paths setup as outlined in README.md, use:
tail -f ~/pgosm-data/district-of-columbia.log
The above command takes roughly 1 minute to run if the PBF for today has already been downloaded. If the PBF is not downloaded it will depend on how long it takes to download the 17 MB PBF file + ~ 1 minute processing.
The output .sql
file is saved under
~/pgosm_data/pgosm-flex-north-america-us-district-of-columbia-run-all.sql
.
This .sql
file can be loaded into a PostGIS enabled database
using:
psql -d postgres -c "CREATE DATABASE myosm;"
psql -d myosm -c "CREATE EXTENSION postgis;"
psql -d myosm \
-f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-run-all.sql
After the docker exec
command completes, the processed OpenStreetMap
data is available in the Docker container on port 5433
and has automatically
been exported to ~/pgosm-data/pgosm-flex-district-of-columbia-run-all.sql
.
Connect and query directly in the Docker container.
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"
┌───────┐
│ count │
╞═══════╡
│ 38480 │
└───────┘
Or load the processed data (now in .sql
format) to the Postgres/PostGIS instance of your choice.
psql -d $YOUR_DB_STRING \
-f ~/pgosm-data/pgosm-flex-district-of-columbia-run-all.sql
The ~/pgosm-data
directory has four (4) files, the PBF and its MD5 chcksum,
the processing log, and the processed output file (.sql
).
ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 17M May 18 17:24 district-of-columbia-2021-05-18.osm.pbf
-rw-r--r-- 1 root root 70 May 18 17:24 district-of-columbia-2021-05-18.osm.pbf.md5
-rw-r--r-- 1 root root 799K May 18 17:25 district-of-columbia.log
-rw-r--r-- 1 root root 117 May 18 17:24 osm2pgsql-district-of-columbia.sh
-rw-r--r-- 1 root root 154M May 18 17:25 pgosm-flex-north-america-us-district-of-columbia-run-all.sql
The designed intent is to load the OpenStreetMap data processed by this
Docker image into your PostGIS database(s).
The process runs pg_dump
on the resulting
data to create the .sql
output file. This can be easily loaded into any
Postgres/PostGIS database using psql
.
psql -d $YOUR_DB_STRING \
-f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-run-all.sql
The source file (.osm.pbf
)
and its MD5 verificiation file (osm.pbf.md5
) get renamed from -latest
to the date (-2021-05-18
). This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The docker exec
command uses the PGOSM_DATE
environment variable
to load these historic files.
See more in docs/DOCKER-RUN.md.
See docs/MANUAL-STEPS-RUN.md for prereqs and steps for running without Docker.
Layer sets are defined under the directory pgosm-flex/flex-config/, the current run-*
options are:
run-all
run-no-tags
run-road-place
run-unitable
Each of these layer sets includes the core layer defintions
(see style/*.lua
)
and post-processing SQL (see sql/*.sql
).
The .lua
scripts work with osm2pgsql's Flex output.
PgOSM-Flex is using these styles with a mix-and-match approach.
This is best illustrated by looking within the main run-all.lua
script.
As the following shows, it does not define any actual styles, only includes
a single style, and runs another layer set (run-no-tags
).
require "style.tags"
require "run-no-tags"
The style.tags
script creates a table osm.tags
that contains all OSM key/value
pairs, but with no geometry. This is the largest table loaded by the run-all
layer set and enables joining any OSM data in another layer (e.g. osm.road_line
)
to find any additional tags.
Use QGIS to visualize OpenStreetMap data? This project includes a few basic
styles using the public.layer_styles
table created by QGIS.
See the QGIS Style README.md for more information.
Loaded by Docker process by default. Is excluded when --data-only
used.
A peek at some of the tables loaded.
This query requires the
PostgreSQL Data Dictionary (PgDD) extension,
use \dt+ osm.*
in psql
for similar details.
SELECT s_name, t_name, rows, size_plus_indexes
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY t_name LIMIT 10;
┌────────┬──────────────────────┬────────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪════════╪═══════════════════╡
│ osm │ amenity_line │ 7 │ 56 kB │
│ osm │ amenity_point │ 5796 │ 1136 kB │
│ osm │ amenity_polygon │ 7593 │ 3704 kB │
│ osm │ building_point │ 525 │ 128 kB │
│ osm │ building_polygon │ 161256 │ 55 MB │
│ osm │ indoor_line │ 1 │ 40 kB │
│ osm │ indoor_point │ 5 │ 40 kB │
│ osm │ indoor_polygon │ 288 │ 136 kB │
│ osm │ infrastructure_point │ 884 │ 216 kB │
│ osm │ landuse_point │ 18 │ 56 kB │
└────────┴──────────────────────┴────────┴───────────────────┘
PgOSM-Flex tracks basic metadata in table osm.pgosm_flex
.
The ts
is set by the post-processing script. It does not necessarily
indicate the date of the data loaded, though in general it should be close
depending on your processing pipeline.
SELECT *
FROM osm.pgosm_flex;
┌────────────┬──────────────┬───────────────┬────────────────────┬──────┬───────────────────────────────────┬───────────────────┐
│ osm_date │ default_date │ region │ pgosm_flex_version │ srid │ project_url │ osm2pgsql_version │
╞════════════╪══════════════╪═══════════════╪════════════════════╪══════╪═══════════════════════════════════╪═══════════════════╡
│ 2020-01-01 │ t │ north-america │ 0.1.1-f488d7b │ 3857 │ https://github.com/rustprooflabs/…│ 1.4.1 │
│ │ │ │ │ │…pgosm-flex │ │
└────────────┴──────────────┴───────────────┴────────────────────┴──────┴───────────────────────────────────┴───────────────────┘
For example queries with data loaded by PgOSM-Flex see docs/QUERY.md.
Loads an range of tags into a materialized view (osm.poi_all
) for easy searching POIs.
Line and polygon data is forced to point geometry using
ST_Centroid()
. This layer duplicates a bunch of other more specific layers
(shop, amenity, etc.) to provide a single place for simplified POI searches.
Special layer included by layer sets run-all
and run-no-tags
.
See style/poi.lua
for logic on how to include POIs.
The topic of POIs is subject and likely is not inclusive of everything that probably should be considered
a POI. If there are POIs missing
from this table please submit a new issue
with sufficient details about what is missing.
Pull requests also welcome! See CONTRIBUTING.md.
Counts of POIs by osm_type
.
SELECT osm_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY osm_type
ORDER BY COUNT(*) DESC;
Results from Washington D.C. subregion (March 2020).
┌──────────┬───────┐
│ osm_type │ count │
╞══════════╪═══════╡
│ amenity │ 12663 │
│ leisure │ 2701 │
│ building │ 2045 │
│ shop │ 1739 │
│ tourism │ 729 │
│ man_made │ 570 │
│ landuse │ 32 │
│ natural │ 19 │
└──────────┴───────┘
Includes Points (N
), Lines (L
) and Polygons (W
).
SELECT geom_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY geom_type
ORDER BY COUNT(*) DESC;
┌───────────┬───────┐
│ geom_type │ count │
╞═══════════╪═══════╡
│ W │ 10740 │
│ N │ 9556 │
│ L │ 202 │
└───────────┴───────┘
From the perspective of database design, the osm.unitable
option is the worst!
This style included in PgOSM-Flex is intended to be used for troublshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know where you are looking.
Load the unitable.lua
script to make the full OpenStreetMap data set available in
one table. This violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.
osm2pgsql --slim --drop \
--output=flex --style=./unitable.lua \
-d pgosm \
~/tmp/district-of-columbia-latest.osm.pbf
The
unitable.lua
script include in in this project was adapted from the unitable example from osm2pgsql. This version uses JSONB instead of HSTORE and takes advantage ofhelpers.lua
to easily customize SRID.
PgOSM-Flex uses JSONB
in Postgres to store the raw OpenSteetMap
key/value data (tags
column)
and relation members (member_ids
).
Current JSONB
columns:
osm.tags.tags
osm.unitable.tags
osm.place_polygon.member_ids
osm.vplace_polygon.member_ids
osm.poi_polygon.member_ids
Blog posts covering various details and background information.