datamade/usaddress

"NE MLK Jr Blvd" is parsed incorrectly

ezheidtmann opened this issue · 0 comments

The short_name of this OSM way is "NE MLK Jr Blvd" -- https://www.openstreetmap.org/way/418650741

It's short for "Northeast Martin Luther King Junior Boulevard".

image

And usaddress does a great job on the latter, but appears to consider "MLK" a "PreType":

13:37 $ ipython
Python 3.6.10 (default, Sep 15 2020, 13:20:10) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import usaddress

In [2]: usaddress.tag("Northeast Martin Luther King Junior Boulevard")
Out[2]: 
(OrderedDict([('StreetNamePreDirectional', 'Northeast'),
              ('StreetName', 'Martin Luther King Junior'),
              ('StreetNamePostType', 'Boulevard')]),
 'Ambiguous')

In [3]: usaddress.tag("NE MLK Jr Blvd")
Out[3]: 
(OrderedDict([('StreetNamePreDirectional', 'NE'),
              ('StreetNamePreType', 'MLK'),
              ('StreetName', 'Jr'),
              ('StreetNamePostType', 'Blvd')]),
 'Ambiguous')

Dozens of US cities have a street named "MLK Jr" and I imagine this abbreviation is common. Let me know what I can do to help improve results here.