/geocoder

Primary LanguageJavaApache License 2.0Apache-2.0

geocoder

A quick and dirty library that extracts location from short free text. NO dependency to any geocoding services. 100% open source including Gazetteer data. No network call or disk IO. All necessary data is contained within the jar and will be stored in-memory (ca. 25MB).

How to use

Code:

Geocoder geocoder = new Geocoder();
Location londonOH = geocoder.resolve("Rancho Cordova, US");
Location moscow   = geocoder.resolve("Москва является удивительным");

Output:

(Rancho Cordova, California, US)
{
  "geonameId" : 5385941,
  "featureCodeCategory" : "SUBADM",
  "defaultName" : "Rancho Cordova",
  "featureCode" : "PPL",
  "codes" : {
    "ADM2" : "067",
    "ADM1" : "CA",
    "PCL" : "US"
  },
  "names" : [ "Rancho Kordova", "Ранчо Кордова", ...],
  "population" : 64776,
  "lat" : 38.58907,
  "lng" : -121.30273
}

(Moscow, RU)
{
  "geonameId" : 524901,
  "featureCodeCategory" : "SUBADM",
  "defaultName" : "Moscow",
  "featureCode" : "PPLC",
  "codes" : {
    "ADM1" : "48",
    "ADM2" : "562331",
    "PCL" : "RU"
  },
  "names" : [ "mwskw", "Mosco"...],
  "population" : 10381222,
  "lat" : 55.75222,
  "lng" : 37.61556
}

Notes

####Input

  • Primarily meant for location entered as free text. It doesn't work well with longer texts (like articles).
  • Only works to town level (no support for street address)
  • Language agnostic (but your mileage may vary for non-English texts)

####Output

  • Here is an example output with comment
{
  // The Geoname ID of this location (see http://www.geonames.org/)
  "geonameId" : 5385941,

  // The type of this location. Roughly matches Geonames' classification
  "featureCodeCategory" : "SUBADM",
  
  // The default, English name of this location
  "defaultName" : "Rancho Cordova",
  
  // The Geoname feature code of this location
  "featureCode" : "PPL",
  
  // Geoname aministration area codes
  "codes" : {
    // This stands for Sacramento county
    "ADM2" : "067",
    // This stands for the state of California
    "ADM1" : "CA",
    // This stands for USA
    "PCL" : "US"
  },
  
  // Alternative names this location is known by
  "names" : [ "Rancho Kordova", "Ранчо Кордова", ...],
  
  // Population of this location
  "population" : 64776,
  
  // Latitude & Longitude
  "lat" : 38.58907,
  "lng" : -121.30273
}

####Performance & accuracy

  • Performance (on my 4-core Macbook pro)
    • Avg. response time: 0.01 ms
    • Throughput: 300K calls / sec
  • Accuracy
    • Both precision and recall were > 0.95 but with a VERY limited dataset
    • Will heavily depend on your data!