This repository is part of the Pelias project. Pelias is an open-source, open-data geocoder originally sponsored by Mapzen. Our official user documentation is here.
Pelias OpenAddresses importer
Overview
The OpenAddresses importer is used to process data from OpenAddresses for import into the Pelias geocoder.
Requirements
Node.js is required. See Pelias software requirements for supported versions.
Installation
For instructions on setting up Pelias as a whole, see our getting started guide. Further instructions here pertain to the OpenAddresses importer only
git clone https://github.com/pelias/openaddresses
cd openaddresses
npm install
Data Download
Use the imports.openaddresses.files
configuration option to limit the download to just the OpenAddresses files of interest.
Refer to the OpenAddresses data listing for file names.
npm run download
Usage
# show full command line options
node import.js --help
# run an import
npm start
Admin Lookup
OpenAddresses records do not contain information about which city, state (or
other region like province), or country that they belong to. Pelias has the
ability to compute these values from Who's on First data.
For more info on how admin lookup works, see the documentation for
pelias/wof-admin-lookup. By default,
adminLookup is enabled. To disable, set imports.adminLookup.enabled
to false
in Pelias config.
Note: Admin lookup requires loading around 5GB of data into memory.
Configuration
This importer can be configured in pelias-config, in the imports.openaddresses
hash. A sample configuration file might look like this:
{
"esclient": {
"hosts": [
{
"env": "development",
"protocol": "http",
"host": "localhost",
"port": 9200
}
]
},
"logger": {
"level": "debug"
},
"imports": {
"whosonfirst": {
"datapath": "/mnt/data/whosonfirst/",
"importPostalcodes": false,
"importVenues": false
},
"openaddresses": {
"datapath": "/mnt/data/openaddresses/",
"files": [ "us/ny/city_of_new_york.csv" ]
}
}
}
The following configuration options are supported by this importer.
key | required | default | description |
---|---|---|---|
datapath |
yes | The absolute path of the directory containing OpenAddresses files. Must be specified if no directory is given as a command-line argument. | |
files |
no | An array of the names of the files to download/import. If specified, only these files will be downloaded and imported, rather than all .csv files in the given directory. If the array is empty, all files will be downloaded and imported. Refer to the OpenAddresses data listing for file names. |
|
missingFilesAreFatal |
no | false | If set to true, any missing files will immediately halt the importer with an error. Otherwise, the importer will continue processing with a warning |
Parallel Importing
Because OpenAddresses consists of many small files, this importer can be configured to run several instances in parallel that coordinate to import all the data.
To use this functionality, replace calls to npm start
with
npm run parallel 3 # replace 3 with your desired level of paralellism
Generally, a paralellism of 2 or 3 is suitable for most tasks.