stac-utils/pystac-client

Problem with `sortby` param

artrey opened this issue · 6 comments

This code gets stuck in an infinite loop:

import pystac_client

client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1/")

geometry = {
    "type": "MultiPolygon",
    "coordinates": [
        [
            [
                [16.419228, 47.934179],
                [16.419834, 47.933377],
                [16.42014, 47.93272],
                [16.420328, 47.932461],
                [16.420167, 47.932231],
                [16.419899, 47.932141],
                [16.419137, 47.931659],
                [16.418209, 47.932935],
                [16.418445, 47.933564],
                [16.418837, 47.933478],
                [16.418928, 47.933611],
                [16.419153, 47.934032],
                [16.419228, 47.934179],
            ]
        ]
    ],
}

response = client.search(
    collections=["sentinel-2-l2a"],
    datetime="2024-01-01/2024-01-16",
    intersects=geometry,
    fields={
        "include": ["properties.datetime", "properties.eo:cloud_cover"],
        "exclude": ["geometry", "bbox", "assets", "collection", "stac_version"],
    },
    sortby="properties.eo:cloud_cover",
    query={"eo:cloud_cover": {"lt": 95}},
)

for item in response.items_as_dicts():
    print(item["properties"])

But if you remove the sortby parameter, everything is OK.

Tested on pystac-client-0.7.2 and pystac-client-0.7.5.

I can confirm the problem — it appears to be a problem with Earth Search itself not proving a next token in the link body when passed sortby:

from itertools import islice

from pystac_client import Client, ItemSearch


def summarize(item_search: ItemSearch) -> None:
    for page in islice(item_search.pages_as_dicts(), 2):
        next_link = next(link for link in page["links"] if link["rel"] == "next")
        next_token = next_link["body"].get("next")
        print(f"Next token: {next_token}")


client = Client.open("https://earth-search.aws.element84.com/v1/")
intersects = {"type": "Point", "coordinates": [-105.1019, 40.1672]}
item_search_without_sortby = client.search(
    collections=["sentinel-2-l2a"],
    intersects=intersects,
)
item_search_with_sortby = client.search(
    collections=["sentinel-2-l2a"],
    intersects=intersects,
    sortby="properties.eo:cloud_cover",
)

print("Without sortby")
summarize(item_search_without_sortby)

print("\nWith sortby")
summarize(item_search_with_sortby)

Output:

Without sortby
Next token: 2023-12-17T18:02:51.569000Z,S2A_13TDE_20231217_0_L2A,sentinel-2-l2a
Next token: 2023-11-24T17:52:54.004000Z,S2A_13TDE_20231124_0_L2A,sentinel-2-l2a

With sortby
Next token: None
Next token: None

I've confirmed that this issue persists when choosing a different sortby (e.g. properties.datetime) and with the new sentinel-2-c1-l2a collection. I'll open an issue on https://github.com/Element84/earth-search and notify the team. Thanks for the report! I'll update this ticket w/ the resolution.

that also indicates a bug in pystac-client, if it's encountering an undefined next token but continuing to iterate items_as_dicts()

that also indicates a bug in pystac-client, if it's encountering an undefined next token but continuing to iterate items_as_dicts()

Disagree. There's still a next link (just no next token in the link's body), so I think pystac-client is correct in assuming it can continue paging.

Ah, okay, I misread what was being output there, Next token instead of Next link. that's definitely a bug, because we should never have a next link without a next param.

This happens with GET as well as POST. Simple reproduction:

# has next token
% curl -s "https://earth-search.aws.element84.com/v1/search?collections=sentinel-2-c1-l2a" | jq .links
[
  {
    "rel": "next",
    "title": "Next page of Items",
    "method": "GET",
    "type": "application/geo+json",
    "href": "https://earth-search.aws.element84.com/v1/search?collections=sentinel-2-c1-l2a&next=2024-01-12T23%3A58%3A12.946000Z%2CS2A_T50CNC_20240112T235747_L2A%2Csentinel-2-c1-l2a"
  },
  {
    "rel": "root",
    "type": "application/json",
    "href": "https://earth-search.aws.element84.com/v1"
  }
]
# with sortby, does not have next token
% curl -s "https://earth-search.aws.element84.com/v1/search?sortby=properties.datetime&collections=sentinel-2-c1-l2a" | jq .links
[
  {
    "rel": "next",
    "title": "Next page of Items",
    "method": "GET",
    "type": "application/geo+json",
    "href": "https://earth-search.aws.element84.com/v1/search?sortby=properties.datetime&collections=sentinel-2-c1-l2a"
  },
  {
    "rel": "root",
    "type": "application/json",
    "href": "https://earth-search.aws.element84.com/v1"
  }
]

Fixed in stac-server stac-utils/stac-server#686