dstl/baleen

Elasticsearch doesn't like Antarctica

chrisflatley opened this issue · 8 comments

If you put use a document with 'Antarctica' in it causes Elasticsearch to exception:

Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse [entities.geoJson
Caused by: com.spatial4j.core.exception.InvalidShapeException: Self-intersection at or near point (-7.409738314942461, -71.63108011089658, NaN)

The GeoJSON we're using for Antarctica appears valid (validated with http://geojsonlint.com/), so this sounds like it's an Elasticsearch error as opposed to a Baleen one?

I think it is valid, but must be more complex that ES likes to deal with. Similar problem with nice pictures shown here pelias-deprecated/quattroshapes#16

I wonder if there's some self intersection/complexity or whether its the fact it's 'special case' of ES/Geojson around the pole (s).

This may have been fixed in Elasticsearch 2, as there have been a whole load of Geo related bugs fixed. I'll try to have a look at it in the next week or two.

Finally got round to trying Elasticsearch 2, and it's giving the same error.

This is still an issue - perhaps look at updating the GeoJSON data in Baleen to the latest version.

https://github.com/datasets/geo-countries/blob/master/data/countries.geojson - NB: data comes from a different repository, so will need to recheck license and update READMEs.

Updated data doesn't resolve the issue, looks like the issue will need resolving by Elasticsearch: elastic/elasticsearch#17407

I've just checked and this is still an issue in Elasticsearch 5.2.0 (I don't have a submittable PR for this yet as the API has changed, NodeBuilder has been removed and the TransportClient is the preferred client with embedded Nodes unsupported).

Digging in a bit further, it sounds like precision/rounding errors may be to blame in ES (see elastic/elasticsearch#7372) or at either the date line or (possibly more likely) the poles cause problems for the validity checks, or the polygon mapping code (dateline was handled a while ago, poles were to come later but I couldn't find a patch / commit) . Given the given the precision in the antarctica data.

There is also some concern at the JTS validation code used elastic/elasticsearch#13397 and it seems there is a plan to move away from JTS which may therefore fix the issue.

Anyway, as a test I replaced the Antarctica entry in countries.geojson with updated geometry exported from QGIS with a COORDINATE_PRECISION of 10 (rather than the default of 15) and the problem has seemingly gone away, at least with ES 5.2 and 2.0.

For info, in 2.4.0-SNAPSHOT I have reduced the precision of all coordinates to better match the original dataset and introduced a test to test the storage of all country GeoJSONs in Elasticsearch.