ltog/osmi-addresses

Use of commas in addr:housenumber value erroneously tagged as error

Closed this issue · 6 comments

Would

FILTER /[^0-9a-zA-Z ]/
be the line that needs tweaking?

ltog commented

@grischard : Thanks for looking it up. Yes, that should be it. I propose to not replace the line but to add another layer with changed settings to have two layers such as misformatted_housenumber_strict and misformatted_housenumber_lenient. (I actually still think the housenumbering format described by @simonpoole is not desirable because it's difficult to extract, therefore I would keep a strict format.)

The proposed change shouldn't need any changes in the C++ code and should therefore be relatively easy. @woodpeck would certainly include the change to show it on the map.

@grischard : Have you already come up with a regex or should I have a look at it?

'^[1-9][0-9]{0,2}[A-Z]{0,3}([-,]*[1-9][0-9]{0,2}[A-Z]{0,3}){0,12}$' covers my lenient needs for Luxembourg, and should be good enough until someone finds the time to work on #93.

ltog commented

@grischard : I propose the following changes:

  • Allow letters to be small: ^[1-9][0-9]{0,2}[a-zA-Z]{0,3}([-,]*[1-9][0-9]{0,2}[a-zA-Z]{0,3}){0,12}$
  • Allow space in front of letters: ^[1-9][0-9]{0,2} ?[a-zA-Z]{0,3}([-,]*[1-9][0-9]{0,2} ?[a-zA-Z]{0,3}){0,12}$
ltog commented

@grischard : Now I remember... I don't knew and still don't know how to invert regexes in UMN Map Server (the software drawing the WMS of the OSM inspector).

Unless you or someone else knows how to do it, I will for the time being just add a new layer that won't flag commas or dashes. That should remove a lot of false positives (but will also hide some true positives).

So currently, the version running on geofabrik.de (branch currently_running_on_geofabrik_server) accepts commas, but the master branch does not. Can we merge this into master, please?

Branch currently_running_on_geofabrik_server accepts slashes as well (which I like), but it should be extended to even allow numbers followed by (upper-case) letters after the slash, as needed here for example "50/3A":
https://www.leonberg.de/Gesundheitsamt-.php?object=tx,1.801.1&ModID=9&FID=2155.454.1&La=1