vladimarius/pyap

Parsing issue with some of the US Address.

Opened this issue · 1 comments

Hi Vlad! Thanks for this amazing work. However, I noticed that the parser has problems with some of the US addresses.

Examples
These following addresses don't parse at all.

  • 20555 Devonshire Street #116 Chatsworth CA 91311

  • 260-C North El Camino Real Encinitas CA 92024-2852

  • 623 H St NW Floor 3 Washington DC 20001

Could you please take a look?

Vlad - seconding the props here. This is epic regex. I also have been finding a few US addresses the package currently misses, wanted to add them if they are helpful.

  • 5717 S. IH-35, Ste. 101 Austin, TX 78744
  • 9500 South IH-35, Ste. E-400 Austin, TX 78748
  • 8522 North Lamar Austin, TX 78753

These plus a couple of @eranjan90 's above would be solved by:

  • Allowing "-" character in street_name
  • Allowing occupancy to match on a simple "#" vs having to have suite/apartment/room in front
  • Making street_type optional...don't know if this leads to way more false positives, I would imagine it might.