/us-address-parser

US address parsing

Primary LanguagePythonMIT LicenseMIT

usaddress

usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods.

What this can do: Using a probabilistic model, it makes (very educated) guesses in identifying address components, even in tricky cases where rule-based parsers typically break down.

What this cannot do: It cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.

We currently only support python 2.7

Installation

> pip install usaddress

To build and test development code.

> pip install -r requirements.txt
> python setup.py develop
> python training/training.py
> nosetests .

Usage

>>> import usaddress
>>> usaddress.parse('123 Main St. Suite 100 Chicago, IL')
[('123', 'AddressNumber'), 
 ('Main', 'StreetName'), 
 ('St.', 'StreetNamePostType'), 
 ('Suite', 'OccupancyType'), 
 ('100', 'OccupancyIdentifier'), 
 ('Chicago,', 'PlaceName'), 
 ('IL', 'StateName')]

Important links

Team

Errors / Bugs

If something is not behaving intuitively, it is a bug, and should be reported. Report it here

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Send us a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2014 Atlanta Journal Constitution. Released under the MIT License.