Traits: How to represent `Polygon EMPTY`?
kylebarron opened this issue · 2 comments
(I'm thinking that the best way to start a focused discussion on a specific part of traits is to open a new issue, rather than add to the existing discussion thread/PR)
In geoarrow-rs I started a custom implementation of parsing a vector of WKB objects (a custom two-pass approach should be really efficient and I want geoarrow<-->WKB to be performant). For the implementation details, I created structs like WKBPolygon
that reference a WKB object in a &[u8]
, and then implemented PolygonTrait
for that struct. Then for the high-level conversion from a WKBArray
to a PolygonArray
, I have two functions called first_pass
and second_pass
that operate on iterators of objects implementing PolygonTrait
. This works surprisingly well! And it means that I can use those exact same functions to convert arrays of anything implementing PolygonTrait
, such as arrays of geo::Polygon
or an array inside a bump-allocated-vec.
But I ran into a snag adding tests from the geoarrow/geoarrow-data
repo. There this Parquet file contains an object that is stored in WKB as Polygon EMPTY
. You can inspect this easily in e.g. geopandas:
But when I try to load this file, I get an error calling PolygonTrait::exterior
and I'm realizing that the PolygonTrait
(and also geo-types
) in effect doesn't support Polygon EMPTY
because it requires at least an exterior ring to exist.
In Arrow specifically, every array has a validity bitmap, so I should be able to work around this in this case (if I enforce that a missing/invalid geometry is the same as an empty geometry, which I'm not sure is true). But I wanted to raise this in relation to traits and see what people thought.
Reconciling different geometry representations is hard, and IMO it's probably not possible to match all users expectations 100% of the time. It definitely feels like a "pick your tradeoffs" kind of thing.
Some possible approximations that come to mind for different circumstances might be: Option<Polygon>::None
or maybe Polygon(vec![])
.
Watch out also for POINT EMPTY
, which precludes something like the second option.
https://github.com/georust/geozero deals with translating between formats, and might have some insight.
if I enforce that a missing/invalid geometry is the same as an empty geometry
I'm not entirely sure what a "missing" geometry is, but an "invalid" geometry feels like a distinct thing from an empty geometry.
e.g. the output of unioning two disjoint polygons should be empty, but not invalid.
Sorry I don't have more answers.
Reconciling different geometry representations is hard, and IMO it's probably not possible to match all users expectations 100% of the time. It definitely feels like a "pick your tradeoffs" kind of thing.
I agree, but also feel like the simple features standard is a good target to strive for, and thus having some way to represent empty geometries is useful.
Maybe it makes most sense to add an is_empty()
method onto the traits and leave the storage mechanism to the underlying format.
In terms of the current polygon trait, however, the user would have to remember to check if it's empty before accessing exterior()
. Maybe that means exterior()
should return an Option
, being None
when the polygon is empty.
If you're curious, GeoArrow defines all geometries except for point as empty if they have an empty internal list, and a point as empty if the coordinates are NaN, NaN
. GeoArrow only defines float coordinates, so they don't have an issue of empty integer points.