Code for working with geo data in MongoDB
Prep test
rm -rf /tmp/data/test
mkdir -p /tmp/data/test
mongod --dbpath /tmp/data/test
mgeneratejs metadata.json -n 10 | mongoimport --uri mongodb://127.0.0.1:27017/test --collection metadata
Use Compass to connect to localhost
and create database test
, with collection osm-gelderland-latest-polygons
and 2dsphere index on field geometry
In Compass mongosh
use test
db.getCollection('osm-gelderland-latest-polygons').drop()
db.createCollection('osm-gelderland-latest-polygons')
db.getCollection('osm-gelderland-latest-polygons').createIndex( { "geometry" : "2dsphere" } )
Alternative to remove documents without deleting collection hence indexes. Can be slower.
db.getCollection('osm-gelderland-latest-polygons').deleteMany({})
TEST db.getCollection('polygons').deleteMany({}) db.getCollection('polygons').createIndex( { "geometry" : "2dsphere" } )
https://jira.mongodb.org/browse/TOOLS-2971
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection polygons --type json --file antarctica-latest-polygons.seq.osm.json 2>&1 | tee antarctica-import.txt
In terminal
use test
db.getCollection('osm-gelderland-latest-polygons').drop()
db.createCollection('osm-gelderland-latest-polygons')
db.getCollection('osm-gelderland-latest-polygons').createIndex( { "geometry" : "2dsphere" } )
cd data
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 gelderland-latest-polygons.simplify.osm.json gelderland-latest.osm.pbf multipolygons
ogr2ogr -explodecollections -skipfailures gelderland-latest-polygons.explode.osm.json gelderland-latest-polygons.simplify.osm.json multipolygons
ogr2ogr -f GeoJSONSeq gelderland-latest-polygons.seq.osm.json gelderland-latest-polygons.explode.osm.json
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection osm-gelderland-latest-polygons --type json --file gelderland-latest-polygons.seq.osm.json 2>&1 | tee osm-gelderland-latest-polygons-import.txt
Open QGIS and add polygon layer
use test
db.getCollection('osm-gelderland-latest-points').drop()
db.createCollection('osm-gelderland-latest-points')
db.getCollection('osm-gelderland-latest-points').createIndex( { "geometry" : "2dsphere" } )
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 gelderland-latest-points.simplify.osm.json gelderland-latest.osm.pbf points
ogr2ogr -explodecollections -skipfailures gelderland-latest-points.explode.osm.json gelderland-latest-points.simplify.osm.json points
ogr2ogr -f GeoJSONSeq gelderland-latest-points.seq.osm.json gelderland-latest-points.explode.osm.json
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection osm-gelderland-latest-points --type json --file gelderland-latest-points.seq.osm.json 2>&1 | tee osm-gelderland-latest-points-import.txt
Open QGIS and add points layer Open Compass and add points layer
use test
db.getCollection('osm-antarctica-latest-polygons').drop()
db.createCollection('osm-antarctica-latest-polygons')
db.getCollection('osm-antarctica-latest-polygons').createIndex( { "geometry" : "2dsphere" } )
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 antarctica-latest-polygons.simplify.osm.json antarctica-latest.osm.pbf multipolygons
ogr2ogr -explodecollections -skipfailures antarctica-latest-polygons.explode.osm.json antarctica-latest-polygons.simplify.osm.json multipolygons
ogr2ogr -f GeoJSONSeq antarctica-latest-polygons.seq.osm.json antarctica-latest-polygons.explode.osm.json
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection osm-antarctica-latest-polygons --type json --file antarctica-latest-polygons.seq.osm.json 2>&1 | tee osm-antarctica-latest-polygons-import.txt
use test
db.getCollection('osm-antarctica-latest-points').drop()
db.createCollection('osm-antarctica-latest-points')
db.getCollection('osm-antarctica-latest-points').createIndex( { "geometry" : "2dsphere" } )
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 antarctica-latest-points.simplify.osm.json antarctica-latest.osm.pbf points
ogr2ogr -explodecollections -skipfailures antarctica-latest-points.explode.osm.json antarctica-latest-points.simplify.osm.json points
ogr2ogr -f GeoJSONSeq antarctica-latest-points.seq.osm.json antarctica-latest-points.explode.osm.json
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection osm-antarctica-latest-points --type json --file antarctica-latest-points.seq.osm.json 2>&1 | tee osm-antarctica-latest-points-import.txt
osm-north-america-latest-polygons north-america-latest-polygons.seq.osm.json
use test
db.getCollection('osm-north-america-latest-polygons').drop()
db.createCollection('osm-north-america-latest-polygons')
db.getCollection('osm-north-america-latest-polygons').createIndex( { "geometry" : "2dsphere" } )
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 north-america-latest-polygons.simplify.osm.json north-america-latest.osm.pbf multipolygons
ogr2ogr -explodecollections -skipfailures north-america-latest-polygons.explode.osm.json north-america-latest-polygons.simplify.osm.json multipolygons
ogr2ogr -f GeoJSONSeq north-america-latest-polygons.seq.osm.json north-america-latest-polygons.explode.osm.json
../tools/mongoimport --uri mongodb://127.0.0.1:27017/test --collection osm-north-america-latest-polygons --type json --file north-america-latest-polygons.seq.osm.json
Do not store 'invalid' data, will be quite big...
2>&1 | tee osm-north-america-latest-polygons-import.txt
head -n 10 north-america-latest.simplify.osm.json head -n 10 north-america-latest.explode.osm.json head -n 5 north-america-latest.seq.osm.json
QGIS fails for north-america polygons layer, too big RAM exhausted
Error: cursor id 2392613233041230777 not found, full error: {'ok': 0.0, 'errmsg': 'cursor id 2392613233041230777 not found', 'code': 43, 'codeName': 'CursorNotFound'}
Following command fails... does not like - in name?
db.osm-gelderland-latest-polygons.deleteMany({})
Works
db.getCollection('osm-gelderland-latest-polygons').deleteMany({})
Download OSM data in pbf format from https://download.geofabrik.de/
rm -rf /tmp/data/db
mkdir -p /tmp/data/db
mongod --dbpath /tmp/data/db
ogrinfo antarctica-latest.osm.pbf
Cannot do all ogr2ogr
stuff in one pass, eg using command below does not explodecollections
for multi*
geometeries. Always good to check resulting file with ogrinfo
.
ogr2ogr -explodecollections -skipfailures -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 -f GeoJSONSeq north-america-latest.osm.json north-america-latest.osm.pbf multipolygons
Instead do multiple steps (and remove temp in between files after confirming successful display in QGIS)
ogr2ogr -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 antarctica-latest.simplify.osm.json antarctica-latest.osm.pbf multipolygons
ogr2ogr -explodecollections -skipfailures antarctica-latest.explode.osm.json antarctica-latest.simplify.osm.json multipolygons
ogr2ogr -f GeoJSONSeq antarctica-latest.seq.osm.json antarctica-latest.explode.osm.json
mongoimport --uri mongodb://127.0.0.1:27017/test --collection polygons --type json --file antarctica-latest.seq.osm.json 2>&1 | tee antarctica-import.txt
Other help commands
history | cut -c 8-
db.polygons.deleteMany({})
db.polygons.deleteOne( { "_id" : ObjectId("615c627e02e90dfb3d807556") } )
Error: Check geometry in object 615c627e02e90dfb3d8078cc or 615c627e02e90dfb3d807557
db.polygons.findOne( { "_id" : ObjectId("615c627e02e90dfb3d8078cc") } )
db.polygons.find( { 'properties.place': { $exists: false } } ).count()
db.polygons.findOne( { 'properties.place': { $exists: false } } )
{
"_id" : ObjectId("615c561502e90dfb3d7f6c00"),
"type" : "Feature",
"properties" : {
"osm_id" : "977615",
"name" : "Site of Special Scientific Intrest No. 8",
"type" : "multipolygon",
"leisure" : "nature_reserve"
},
"geometry" : {
"type" : "Polygon",
"coordinates" : [
[
[
-58.465,
-62.2359
],
[
-58.4652,
-62.2363
],
[
-58.4648,
-62.2363
],
[
-58.4648,
-62.2359
],
[
-58.465,
-62.2359
]
]
]
}
Questions
Q: QGIS 3 on Mac OS - Where is the configuration directory?
A: From the QGIS Settings Menu, User Profiles, select "Open active profile folder". You'll be taken straight there.
Tips
One tip (shot myself in the foot first): create the 2dsphere index upfront, then run the mongoimport on a (huge) json file. It will then all but the 'invalid' geojson. If you do it the other way around you get into a loop of index creation breaking on failing documents one by one which can be a pita with huge sets.
Get some data in protobuf format. Start small with Antarctica
first, download some *.osm.pbf
from https://download.geofabrik.de/
Get gdal with brew install gdal
or similar.
Use ogrinfo
to check geometry types in source (pdf) file
ogrinfo north-america-latest.osm.pbf
Displays overview of geometry types.
INFO: Open of `north-america-latest.osm.pbf'
using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)
Run ogr2ogr
to convert to geosjosn.
ogr2ogr antarctica-latest.osm.json antarctica-latest.osm.pbf points
Above command extracts points and creates a json file with a FeatureCollection, you can similar for lines or polygons or multi* geometries
(Always) use GeoJSONSeq as output format, gives you just one geojson feature per line ready for import. So you avoid all kind of jq/cat/split/for..do preparation and processiong scenarios later on.
So instead of pervious command better use
ogr2ogr -f GeoJSONSeq antarctica-latest.osm.json antarctica-latest.osm.pbf points
You can do much more with ogr2ogr
to only extract the data you want, add fields etc etc.
When happy with your workflow repeat for other files and add these to same collection in MongoDB to build an even bigger data set.
Repeat for the 10GB for North America at https://download.geofabrik.de/
Check with Compass or QGIS (see repo https://github.com/taatuut/mongeovisqgis)
Use -explodecollections
on multi* geometries like multipolygons
to get one feature per document.
Storing multiple geometry types in one collection works (and can make sense), but depending on client (GIS) system might add some extra configuration/limitations.
Use -skipfailures
when you just wat a set and don't care about erroneous features
ogr2ogr -explodecollections -skipfailures -f GeoJSONSeq north-america-latest.osm.json north-america-latest.osm.pbf multipolygons
mongoimport --uri mongodb://127.0.0.1:27017/test --collection polygons --file north-america-latest.osm.json --type json
2021-10-03T16:05:56.932+0200 [###.....................] test.polygons 5.80GB/38.7GB (15.0%) 2021-10-03T16:05:59.177+0200 error inserting documents: BSONObj size: 30477049 (0x1D10AF9) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "polygons" 2021-10-03T16:05:59.932+0200 [###.....................] test.polygons 5.86GB/38.7GB (15.1%)
A 30Mb polygon fails to insert...
Loading the resulting 40Gb file with 66M records takes hours on my laptop... Better prep some decent Atlas environment
Create 2dsphere index fails
Index build failed: 165d1184-51f1-481a-90db-6310439343ba: Collection test.polygons ( a57eabdd-a73e-485b-8100-932bd278c791 ) :: caused by :: Can't extract geo keys:
{ _id: ObjectId('6159b67cfb9ff107f19ed0d4'), type: "Feature", properties: { osm_id: "8027", type: "multipolygon", natural: "water", other_tags: ""water"=>"river"" }, geometry: { type: "Polygon", coordinates: [ [ [ -137.3265104, 62.7841879 ], [ -137.3347999, 62.785039 ], [ -137.340364, 62.7827365 ], [ -137.3421156, 62.7821497 ], [ -137.342577, 62.7821228 ], [ -137.3428935, 62.782032 ], [ -137.3442786, 62.7816459 ], [ -137.3451144, 62.7816615 ], [ -137.3453611, 62.7817376 ], [ -137.3456347, 62.7818332 ], [ -137.3458117, 62.7819019 ], ...
Is this is about speed and limiting polygon size versus accuracy, add -simplify
and COORDINATE_PRECISION
to the ogr2ogr
command
ogr2ogr -explodecollections -skipfailures -simplify .1 -makevalid -lco COORDINATE_PRECISION=4 -f GeoJSONSeq north-america-latest.osm.json north-america-latest.osm.pbf multipolygons
- Compare input json file size and MongoDB size
- View in QIS
brew install pigz
pigz -9 -k north-america-latest.osm.json
How to convert other_tags
info from OSM source format to geojson
UHow to query OSM source for specific info during conversion
Working with US Tiger data