/addressextract

Libosmium based Openstreetmap address extractor

Primary LanguageC++

addressextract parses an OpenStreetmap PBF file and exports all addresses e.g. objects with addr:* tags on them. It also reassembles admin and postcode boundarys and extends the addressinformation with those.

A typical address looks like this:

{
  "city": "Paderborn",
  "geomcity": "Paderborn",
  "geomcounty": "Kreis Paderborn",
  "geompostcode": "33098",
  "geomsuburb": "Paderborn",
  "housenumber": "2a",
  "housename": "foobar",
  "id": "25906061",
  "lat": "51.717383",
  "lon": "8.752651",
  "postcode": "33098",
  "source": "node",
  "street": "Marienplatz"
}

All the geom* tags originate from the boundaries, be it postcode or admin boundary. id and source tell the origin osm object be it node or way and its id.

Additionally you could run the extractor with "-e" so it will add an error field describing problems with this address - It could look like this:

  "errors": [
    "No housenumber",
    "No postcode",
    "No addr:street or addr:place"
  ]

Usage

addressextract -i mylocal.pbf >addresses.json

Filtering and extracting

To filter or extract addresses from the resulting json there is either the tool addrfilter:

addrfilter -p 33330 -i owl.json -c

To export all address with postcode 33330 and dump it as CSV. An alternative is to use jq:

jq -r '.addresses[] | [ .id, .source, .postcode, .city, .street, .housenumber ] | @csv' owl.json

Build

apt-get install build-essential cmake-data cmake libboost-all-dev \
    libspatialindex-dev libgdal-dev libbz2-dev libexpat1-dev

git clone https://github.com/flohoff/addressextract.git

cd addressextract

git clone https://github.com/nlohmann/json
git clone https://github.com/mapbox/protozero
git clone https://github.com/osmcode/libosmium

cmake .
make

Search as you type

Using Debian/Buster to setup a solr:

sudo apt-get install solr-jetty jq
sudo cp -rav searchasyoutype/solr36conf/* /etc/solr/
sudo service jetty9 restart

addressextract -i <mylittlepbf | jq .addresses >addresses.json

searchasyoutype/pushtosolr http://localhost:8080/solr addresses.json

Now you can query the solr for results:

curl http://localhost:8080/solr/address/?q=postcode:3333%20street:Heidestra%C3%9Fe

Typical query time is less than 50ms and the result looks like this:

{
  "responseHeader": {
    "status": 0,
    "QTime": 13,
    "params": {
      "q": "postcode:33330 street:Heidestraße"
    }
  },
  "response": {
    "numFound": 26262,
    "start": 0,
    "docs": [
      {
        "city": "Gütersloh",
        "geomcity": "Gütersloh",
        "geompostcode": "33330",
        "housenumber": "1c",
        "id": "7380458133",
        "postcode": "33330",
        "street": "Heidestraße"
      },
[ ... ]

Todo

  • hausnummern mit spaces oder "," z.b. "4a,b" oder "5a,5b" 33602,Bielefeld,LOOM, Bahnhofstraße,28,52.025702,8.531427,node,5202495577,33602,Bielefeld,,