sat-utils/sat-api

Question about behavior of intersects filter

mhiley opened this issue · 8 comments

Hey everyone,

I've got a question regarding the behavior of the intersects filter when querying sat-api, specifically with Sentinel-2 data.

In the screenshot below, you can see the intersects geometry I am sending in the sat-api POST body in red, and the geometry from one of the returned search results in green, overlaid on the corresponding Sentinel-2 scene. As you can see in the screenshot, the geometry sent in the POST body does not actually intersect with the item geometry, so I wouldn't have expected this item to be present in the search results.

Screen Shot 2019-05-09 at 2 46 07 PM

Can anyone give insight on what is going on here? Maybe sat-api is actually querying whether the bounding boxes of the two polygons intersect, as opposed to testing the polygons themselves?

If there has been talk of any proposed pathway to fixing this behavior we would be happy to help work on it.

Here are the details for this specific example:

Sat-api query:
POST https://sat-api.developmentseed.org/stac/search
with body:

{
  "time": "2018-01-01T00:00:00Z/2019-05-09T00:00:00Z",
  "intersects": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -89.566844,
          43.171916
        ],
        [
          -89.566844,
          42.998071
        ],
        [
          -89.246813,
          42.998071
        ],
        [
          -89.246813,
          43.171916
        ],
        [
          -89.566844,
          43.171916
        ]
      ]
    ]
  },
  "query": {
    "eo:cloud_cover": {
      "lt": 30
    }
  },
  "sort": [
    {
      "field": "datetime",
      "direction": "desc"
    }
  ],
  "limit": "20"
}

which gives response:
sat-api-response.json.txt

The scene in the screenshot is S2B_15TYH_20190426_0 which is available at s3://sentinel-s2-l1c/tiles/15/T/YH/2019/4/26/0/.

Thanks for any help!

I wonder if the 5mi precision used in the elasticsearch query is the issue? https://github.com/sat-utils/sat-api/blob/master/packages/api-lib/libs/es.js#L123

In this example the polygons come within about a mile of intersecting each other.

Hi @mhiley thanks for raising this.

I think you're right, it's probably the precision here. The polygons are awfully close to one another. If you shift your search AOI a little bit more away from the polygon does it still find it? How close do you need to be for it to find it?

We haven't done much tuning of those parameters, so I'm not sure what is best. Precision isn't even required so I'm wondering if removing it entirely will improve this problem.

@matthewhanson Thanks for the reply. We have our own deployed instance of sat-api and I just figured out how to test the api functions locally. I'll see if I can directly test the impact of changing or removing the precision parameter.

Ok it turns out precision is set when the index is created, as opposed to being settable on a given elasticsearch query, which I didn't initially realize.

I'm going to try rebuilding our items index with the default precision (50 meters) so I can verify that that resolves this issue. Hopefully that'll happen this week - I've got to wait for a quiet day to blow away and recreate the index.

@matthewhanson I was able to do this experiment this morning - I deleted the items index and recreated it with Elasticsearch's default precision of 50 meters. I re-ingested the Sentinel2 scene referenced in this issue and confirmed it no longer shows up when searching for the bounding box in this issue.

I'll report back if we have any performance issues due to the increased precision.

This is great @mhiley , thanks for the update.
I'd be interested in the performance. We're planning on releasing sat-api 0.2.4 next week which will contain some minor updates (still STAC 0.6.0, not 0.7.0 yet) and I would include this but I'm a bit concerned about the performance with millions of Items. I'm sure the original value of 5mi is not optimal either, so perhaps it might be best to make this a configurable parameter (via an envvar) on deploy.

@matthewhanson The increased index precision seems to have a pretty major impact on Elasticsearch disk usage.

Originally, with the 5mi precision, we were using about 2GB disk for about a half million items.

Now, with the 50m resolution, I've increased the cluster disk size to 35GB and that is already full after ingesting only about 115,000 items.

I'll keep updating this issue as we learn more..

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html#_performance_considerations_with_prefix_trees ("Of course, calculating the terms, keeping them in memory, and storing them on disk all have a price. Especially with higher tree levels, indices can become extremely large even with a modest amount of data.")

Ok this is good to know. I think in that case, I will revert the default value back to 5mi, but of course will keep the envvar so it can be changed. That way there isn't any unexpected performance hits by default.