usaddress-java is a Java port of the usaddress Python library for parsing unstructured United States address strings into address components, using NLP methods.
What this can do: Using a probabilistic model, it makes educated guesses in identifying address components, even in tricky cases where rule-based parsers typically break down.
What this cannot do: It cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.
-
Install the usaddress Maven dependency:
<dependency> <groupId>io.github.dgileadi.usaddress</groupId> <artifactId>usaddress</artifactId> <version>1.0.0</version> </dependency>
-
Parse some addresses!
Note that
parse
andparseAndClean
are different methods:import io.github.dgileadi.usaddress.Address; import io.github.dgileadi.usaddress.AddressParser; .. String address = "123 Main St. Suite 100 Chicago, IL"; // The parse method will split your address string into components, and label each component. Address parsed = AddressParser.parse(address); // The parseAndClean method will try to be a little smarter. // It will merge consecutive components and strip commas. Address parsed = AddressParser.parseAndClean(address);
For more details you may read the API documentation.
To build a development version of usaddress on your machine, run the following code in your command line:
git clone https://github.com/dgileadi/usaddress-java.git
cd usaddress-java
mvn clean install
Copyright (c) 2023 David Gileadi.
Original code copyright (c) 2014 Atlanta Journal Constitution.
Released under the MIT License.