Simple Catalog search.items() goes on infinitely
iliion opened this issue · 7 comments
pystac_client version: 0.7.5
I am performing the following simple request to get some items from a catalog and this ends up in an infinite loop (?).
from pystac_client import Client
import datetime
def main():
catalog = Client.open(url='https://earth-search.aws.element84.com/v1/')
my_search = catalog.search(collections='cop-dem-glo-30', limit = 5)
print(my_search.url_with_parameters())
# prints out -> `https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30`
for item in my_search.items():
print(item)
if __name__ == '__main__':
main()
In the above example I would just expect to the api to return 5 items per page.
What I get instead are multiple requests of the following https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30
.
In addtion if the results are less than the limit imposed, then the api will keep returning repeatedly the same items (and not necessarilty in the same order).
Tom is correct, if you only want to return five items, use max_items
. A couple of other things:
In the above example I would just expect to the api to return 5 items per page.
It should, but to check this you need to:
for page in my_search.pages_as_dicts():
print(len(page))
In this line:
print(my_search.url_with_parameters())
During paging, the search object is not updated with the paging parameters, so url_with_parameters
will not change while paging. See
pystac-client/pystac_client/stac_api_io.py
Lines 282 to 312 in 4ea6dac
Ok I understand that the search request will return all pages and the limit
will be the size of the each page and I get the number of items in each page from print(len(page['features']))
My problem is that the requests will go on infinitely when I ran the above example in my catalog. I understand that this is a bug on my part but I cant understand the reason. Maybe you have a clue why the requests from the client wont stop. Do i miss something in the api specification?
FYI: The api response follows the specs here (https://api.stacspec.org/v1.0.0/item-search/#tag/Item-Search)
I think I know what is wrong. stac_client does not support paging implemented with page=x
parameter.
For the following request http://localhost:20008/search?limit=2&collections=test-collection
The rel
=next
link will have this href
-> http://localhost:20008/search?limit=2&collections=test-collection&page=1
Unfortunately the above url is parsed and the output is the following
{
"rel":"next",
"type":"application/json",
"method":"POST",
"href":"http://localhost:20008/search",
"body":{
"limit":2,
"collections":[
"test-collection"
],
"token":1
}
}
Unfortunately the above url is parsed and the output is the following
I don't quite know what you mean by this. The read_text
method doesn't make any assumptions about pagination -- it simply uses what the server returns:
pystac-client/pystac_client/stac_api_io.py
Lines 128 to 172 in 4ea6dac
To continue debugging, can you provide the following:
- The first page returned by the server (the initial response)
- The HTTP request sent by pystac-client to get the second page (e.g. by following the instructions here: https://stackoverflow.com/questions/10588644/how-can-i-see-the-entire-http-request-thats-being-sent-by-my-python-application)
My guess was read_json()
I will try to be more clear.
http://localhost:20008/search?limit=2&collections=test-collection
will output a response where the next
link is like this:
{
"rel":"next",
"type":"application/json",
"method":"GET",
"href":"http://localhost:20008/search?limit=1&collections=test-collection&page=1"
}
If I run the following and print the response then I get something different
catalog = Client.open(url='http://localhost:20008')
my_search = catalog.search(collections='test-collection', limit = 1)
for page in my_search.pages_as_dicts():
print(my_search.url_with_parameters())
# -> http://localhost:20008/search?limit=1&collections=test-collection
print(page['links'])
The page['links'] will output a response where the next
link is this:
{
"rel":"next",
"type":"application/json",
"method":"POST",
"href":"http://localhost:20008/search",
"body":{
"limit":2,
"collections":[
"test-collection"
],
"token":1
}
}
The point is that the loop will not stop
DEBUG
. . .
REQUEST 0
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '60', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"], "token": 1}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 60\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"], "token": 1}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:30 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
REQUEST 1
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '48', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"]}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 48\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"]}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:33 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
<Item id=test-item-1>
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '60', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"], "token": 1}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 60\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"], "token": 1}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:33 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
<Item id=test-item-1>
.. .. .. (infinite loop).. .. ..
This is a problem with your server. pages_as_dicts
does not modify the links
attribute in any way:
pystac-client/pystac_client/item_search.py
Lines 725 to 749 in 4ea6dac
Closing as not-an-issue-with-pystac-client, please re-open if you find otherwise.