/geopip

Reverse geocode a lng/lat coordinate within a geojson `FeatureCollection` and return information about the containing country (polygon).

Primary LanguagePythonMIT LicenseMIT

GEOPIP: Geojson Point in Polygon (PIP)

Build Status Coverage Status Tested CPython Versions Tested PyPy Versions PyPi version PyPi license

Reverse geocode a lng/lat coordinate within a geojson FeatureCollection and return information about the containing country (polygon).

Basically, you can use any geojson file (top level is a FeatureCollection) for reverse coding - set the environment variable REVERSE_GEOCODE_DATA to the geojson file. Only Polygon and MultiPolygon features will be used! If a point is found to be in a feature, the properties of that feature will be returned.

In other words, provide a geojson with postcode boundaries, and you can query for the postcode in which a coordinate is. Provide timezone boundaries and you can find the timezone for a coordinate. Be creative :).

The default shape data (contained within the package) is from thematicmapping (the simple shapes). It contains polygons representing one country with the following meta-data (properties):

FIPS      String(2)         FIPS 10-4 Country Code
ISO2      String(2)         ISO 3166-1 Alpha-2 Country Code
ISO3      String(3)         ISO 3166-1 Alpha-3 Country Code
UN        Short Integer(3)  ISO 3166-1 Numeric-3 Country Code
NAME      String(50)        Name of country/area
AREA      Long Integer(7)   Land area, FAO Statistics (2002)
POP2005   Double(10,0)      Population, World Population Prospects (2005)
REGION    Short Integer(3)  Macro geographical (continental region), UN Statistics
SUBREGION Short Integer(3)  Geographical sub-region, UN Statistics
LON       FLOAT (7,3)       Longitude
LAT       FLOAT (6,3)       Latitude

Hence, you can use this package as an offline reverse geocoder on the country level (by default):

In [1]: import geopip
In [2]: geopip.search(lng=4.910248, lat=50.850981)
Out[2]:
{'AREA': 0,
 'FIPS': 'BE',
 'ISO2': 'BE',
 'ISO3': 'BEL',
 'LAT': 50.643,
 'LON': 4.664,
 'NAME': 'Belgium',
 'POP2005': 10398049,
 'REGION': 150,
 'SUBREGION': 155,
 'UN': 56}

NOTE: Since the polygons for each country are quite simple, reverse geocoding at the borders of two countrys is not exact. Use polygons with higher resolution for these use cases (see Data).

The shapely package will be used, if installed. Otherwise, a pure python implementation will be used (on the basis of winding numbers). See here, here and here for more informations and example implementations. Espacially for larger features, the shapely implementation might give performance improvements (default shape data and 2.6 GHz Intel Core i7, python3.6.2, cythonized version of geohash-hilbert):

Pure:

In [1]: import geopip
In [2]: geopip._geopip.p_in_polygon?
Signature: geopip._geopip.p_in_polygon(p, shp)
Docstring:
Test, whether point `p` is in shape `shp`.

Use the pure python implementation for this.

Parameters:
    p: Tuple[float, float]  Point (lng, lat) in WGS84.
    shp: Dict[str, Any]     Prepared shape dictionary from `geopip._pure.prepare()`.

Returns:
    boolean: True, if p in shp, False otherwise
File:      ~/repositories/geopip/geopip/_pure.py
Type:      function
In [3]: %timeit geopip.search(4.910248, 50.850981)
25.6 µs ± 390 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Shapely:

In [1]: import geopip
In [2]: geopip_geopip.p_in_polygon?
Signature: geopip._geopip.p_in_polygon(p, shp)
Docstring:
Test, whether point `p` is in shape `shp`.

Use the shapely implementation for this.

Parameters:
    p: Tuple[float, float]  Point (lng, lat) in WGS84.
    shp: Dict[str, Any]     Prepared shape dictionary from `geopip._shapely.prepare()`.

Returns:
    boolean: True, if p in shp, False otherwise
File:      ~/repositories/geopip/geopip/_shapely.py
Type:      function
In [3]: %timeit geopip.search(4.910248, 50.850981)
50 µs ± 601 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For simple geojsons, the pure python implementation is faster, but on more complex polygons, the shapely implementation will win.

Install

pip install geopip

If you require the extra speed, because you have many polygons and / or very detailed polygons, try installing geohash-hilbert with Cython extensions and / or have (vectorized) shapely installed.

# make sure to have GEOS library installed (including dev extensions)
pip install numpy 'shapely[vectorized]>=1.6'

pip install cython  # for building geohash-hilbert's cython extension
pip install --upgrade geohash-hilbert

Data

Other interesting shape data can be found at:

NOTE: shapefiles / gdb databases have to be transformed into geojson. One way is to use fiona. Assuming the gdb files are in the directory /data/gdb:

fio insp /data/gdb
# a python shell opens
>>> import json
>>> features = []
>>> for feat in src:
...     features += [feat]
...
>>> f = open('/data/gdb.geo.json', 'w')
>>> json.dump(dict(type='FeatureCollection', features=features), f)
>>> f.close()

Then the gdb will be transformed into a geojson file gdb.geo.json.

Documentation

(TODO more) Basically, there are the two functions geopip.search and geopip.search_all that perform the search in the provided FeatureCollection. Then there is the class geopip.GeoPIP that accepts a FeatureCollection either as a file or a dictionary and provides the same search functionality:

search

In [1]: import geopip
In [2]: geopip.search?
Signature: geopip.search(lng, lat)
Docstring:
Reverse geocode lng/lat coordinate within the features from `instance().shapes`.

Look within the features from the `instance().shapes` function for a polygon that
contains the point (lng, lat). From the first found feature the `porperties`
will be returned. `None`, if no feature containes the point.

Parameters:
    lng: float  Longitude (-180, 180) of point. (WGS84)
    lat: float  Latitude (-90, 90) of point. (WGS84)

Returns:
    Dict[Any, Any]  `Properties` of found feature. `None` if nothing is found.
File:      ~/repositories/geopip/geopip/__init__.py
Type:      function

search_all

In [1]: import geopip
In [2]: geopip.search_all?
Signature: geopip.search_all(lng, lat)
Docstring:
Reverse geocode lng/lat coordinate within the features from `instance().shapes`.

Look within the features from the `instance().shapes` function for all polygon that
contains the point (lng, lat). From all found feature the `porperties`
will be returned (more or less sorted from smallest to largest feature).
`None`, if no feature containes the point.

Parameters:
    lng: float  Longitude (-180, 180) of point. (WGS84)
    lat: float  Latitude (-90, 90) of point. (WGS84)

Returns:
    Iterator[Dict[Any, Any]]  Iterator for `properties` of found features.
File:      ~/repositories/geopip/geopip/__init__.py
Type:      function

GeoPIP

In [1]: import geopip
In [2]: geopip.GeoPIP?
Init signature: geopip.GeoPIP(self, *args, **kwargs)
Docstring:
GeoPIP: Geojson Point in Polygon (PIP)

Reverse geocode a lng/lat coordinate within a geojson `FeatureCollection` and
return information about the containing polygon.
Init docstring:
Provide the geojson either as a file (`filename`) or as a geojson
dict (`geojson_dict`). If none of both is given, it tries to load the
file pointed to in the environment variable `REVERSE_GEOCODE_DATA`. If the
variable is not set, a default geojson will be loaded (packaged):
    http://thematicmapping.org/downloads/world_borders.php

During init, the geojson will be prepared (see pure / shapely implementation)
and indexed with geohashes.

Provide the parameters as kwargs!

Allowed parameters:
    filename: str                 Path to a geojson file.
    geojson_dict: Dict[str, Any]  Geojson dictionary. `FeatureCollection` required!
File:           ~/repositories/geopip/geopip/_geopip.py
Type:           type

A GeoPIP object provides the same search and search_all functions.