Cleans OpenStreetMap dataset, converts it from XML to JSON format. The cleaned .json file is ready to be imported into a MongoDB database using the mongoimport shell command.
It does the following things:
- process 2 types of top level tags: "node" and "way"
- attributes in the CREATED array should be added under a key "created"
- attributes for latitude and longitude are added to a "pos" array, for use in geospacial indexing.
- if the second level tag "k" value contains problematic characters, it is ignored
- if the second level tag "k" value starts with "addr:", it's added to a dictionary "address"
- if there is a second ":" that separates the type/direction of a street, the tag is ignored, for example:
tag k="addr:housenumber" v="5158"/
tag k="addr:street" v="North Lincoln Avenue"/
tag k="addr:street:name" v="Lincoln"/
tag k="addr:street:prefix" v="North"/
tag k="addr:street:type" v="Avenue"/
is turned into:
{... "address": { "housenumber": 5158, "street": "North Lincoln Avenue" } ... }
-
for "way" specifically:
nd ref="305896090"/
nd ref="1719825889"/
is turned into "node_refs": ["305896090", "1719825889"]
- if the street type is abbreviated, the mapping dictionary corrects the abbreviations
- makes sure postcode is in a correct 5-digit form