developmentseed/geojson-pydantic

__geo_interface__ specification and implementation

vincentsarago opened this issue · 9 comments

from #94 (comment)

Feature / FeatureCollection

with the change introduced in ☝️ PR we get __geo_interface__ for empty feature which looks like

Feature(type="Feature", geometry=None, properties=None).__geo_interface__
>> {'type': 'Feature', 'geometry': None, 'properties': None}

The specs says:

properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.

geometry (optional)
The geometric object of a "Feature" type, also as a mapping.

So to me it's a bit unclear if we should return properties=None, properties={} or no properties key when empty. Same for empty geometries, should we return geometries=None or no geometry key 🤷

ref: https://gist.github.com/sgillies/2217756#__geo_interface__

Geometry

For now we don't support empty coordinates for geometry (see #89) but same, what should the __geo_interface__ looks like for empty geometries?

ref:

I started to write up a reply to the gist to ask for clarification but ended up deciding against it. Looking at what others have done for their implementations it seems like everyone is just using their GeoJSON implementation for the __geo_interface__. And not getting hung up on the optional / required keys per the gist.

For this library it could simply be the following:

def __geo_interface__(self) -> Dict[str, Any]:
    return self.dict()

One interesting bit is that some implementations are converting "properties": None" to "properties": {}. Which I suppose is valid, but does mean it changes type.

we could also ask directly to @sgillies 🙈 what's best between "geometry": None and no geometry key, same for properties ("properties": None or "properties": {} or no properties key)

for geometries, we also need to decide if we want "coordinates": None, "coordinates": [] or no coordinates key

To me it seem the specification could be a little more specific and if it is intended to be compatible with the geojson specification.

For this library it could simply be the following:

def __geo_interface__(self) -> Dict[str, Any]:
    return self.dict()

☝️ In theory yes but not in practice because:

  • model can be expanded and we don't want to forward keys not in the spec
  • for Feature and FeatureCollection you'll need to take care of nesting (a Feature as Geometry object in the geometry attribute.)

model can be expanded and we don't want to forward keys not in the spec

I was initially thinking the same thing, however we already include id. Which isn't mentioned anywhere in the gist. And looking at geopandas they will have an id as well. Same with jazzband/geojson. Everyone seems to just map their "GeoJSON as a dict" function to __geo_interface__.

Extra clarification isn't a bad thing though. I just think the intent probably is any valid GeoJSON as a python dict. Despite the wording saying something slightly different, depending on how its read.

for Feature and FeatureCollection you'll need to take care of nesting (a Feature as Geometry object in the geometry attribute.)

The .dict() should already take care of nesting. Though there are options to exclude null or unset values which may be a good idea.

>>> Feature(type="Feature", geometry=Point(type="Point", coordinates=(1,1)), properties=None).dict()
{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': (1.0, 1.0)}, 'properties': None, 'id': None, 'bbox': None}

>>> Feature(type="Feature", geometry=Point(type="Point", coordinates=(1,1)), properties=None).dict(exclude_unset=True)
{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': (1.0, 1.0)}, 'properties': None}

The .dict() should already take care of nesting. Though there are options to exclude null or unset values which may be a good idea.

😅 I should have checked before saying that

You can put anything in the mapping, but you can't count on the receiver knowing what "id" means.

https://www.rfc-editor.org/rfc/rfc7946#section-3.2 says a Feature has a "geometry" member. Same for "properties". Thus, {"type": "Feature"} isn't strictly a GeoJSON Feature.

Have you seen this python/typing#182 (comment)? Seems sort of relevant.

It doesn't seem like there is a clear "best" approach for implementing the interface, .dict() is the most transparent and explainable so I'd go with that.

@sgillies we agree on the geojson part, it was more for the __geo_interface__ that I'm struggling to understand what we should do.

properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.

geometry (optional)
The geometric object of a "Feature" type, also as a mapping.

if the properties and geometry are optional does it means the keys can just not be there, or does it means it should be None.

Let's consider this Valid Geojson feature '{"type": "Feature", "geometry": null, "properties": null}', What should be the geo_interface representation?

{"type": "Feature"}

{"type": "Feature", "geometry": None}

{"type": "Feature", "geometry": None, "properties": {}}

Edit: I got beat by a minute on the reply. Some of this is now a bit repetitive, sorry.

https://www.rfc-editor.org/rfc/rfc7946#section-3.2 says a Feature has a "geometry" member. Same for "properties". Thus, {"type": "Feature"} isn't strictly a GeoJSON Feature.

@sgillies In __geo_interface__ it says: "The keys are: ... properties (optional) - A mapping ...". Which could be interpreted as:

  • The properties key does not have to exist within the mapping.
    • That would mean a __geo_interface__ with a value of {"type": "Feature"} is valid, even though it is not valid GeoJSON.
    • This only really matters when reading from one. Since it is not guaranteed to exist __geo_interface__["properties"] should not be used and a .get() would be needed.
  • If the properties key does exist it must be a mapping. So a null value would not be valid.
    • I think this is probably wrong since it would be breaking from the spec.
    • It would mean a __geo_interface__ with a value of {"type": "Feature", "geometry": None, "properties": None} would not be valid despite being valid GeoJSON.

I am sure the intent was to not break from the GeoJSON spec but there seems to be a little room for interpretation on the existence of the properties key. As well as other "optional" keys.

Stepping back and looking at what others have done it seems like the implicit agreement is that: __geo_interface__ is simply valid GeoJSON as a python mapping. But that is not necessarily guaranteed if some fields are optional.

The geo interface doc is ~5 years older than the GeoJSON RFC and needs an update. In my opinion it ought to read

"properties (required for Feature)", "geometry (required for Feature)", "coordinates (required for Geometry)".