Pagination improvements
Closed this issue · 3 comments
Hello!
There are two aspects to this ticket.
Document that the API only returns the first 500 results
It seems like a pretty important thing to note, but the API (notably the iris_investigate api - your most used endpoint) only returns 500 results by default. This is not stated in the main places you might expect it to be:
https://github.com/DomainTools/python_api
https://github.com/DomainTools/python_api/blob/master/domaintools/api.py#L277
The documentation string for iris_investigate could even be construed to indicate that all of the results are returned:
You can loop over results of your investigation as if it was a native Python list:
for result in api.iris_investigate(ip='199.30.228.112'): # Enables looping over all related results
Handle pagination natively within the library
Wanting to get all results for a query rather than just the first 500 seems like a common use case for users - I tried the most obvious method of adding limit=5000 as an argument to iris_investigate e.g.:
with domaintools_obj.iris_investigate(search_hash=SEARCH_HASH, limit=5000) as results:
for result in results:
...
However this appears to have no effect. Inspecting the library code I think that this isnt a valid argument.
If this is the case, it would be nice if pagination were handled within the library.
Thanks,
Tom
For anyone reading this who wants to accomplish this before the client library is improved to do this natively, its not too hard to do:
def get_paginated_dt_results(query, position=None, results=[], limit=500):
with domaintools_obj.iris_investigate(search_hash=query, position=position) as dt_results:
for result in dt_results:
results.append(result)
if len(results) >= limit:
return results
if dt_results['has_more_results'] is True:
position = dt_results['position']
return get_paginated_dt_results(query, position=position, results=results, limit=limit)
return results
There appears to be a related issue in the example. If the result is a multiple of the page limit (500), the last page will contain "has_more_results": true,
but there will be no position
.
https://github.com/DomainTools/python_api/blob/main/examples/retrieving_all_results_in_paginated_return.py
>>> while response['has_more_results']:
... response = dt_api.iris_investigate(search_hash=query_hash, position=response['position'])
... results.extend(response['results'])
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "./domaintools/base_results.py", line 188, in __getitem__
return self.response()[key]
KeyError: 'position'
Adding and 'position' in response
to the while statement will work around the server response issue.
from domaintools import API
dt_api = API(USER_NAME, KEY)
query = "SEARCH_HASH"
response = dt_api.iris_investigate(search_hash=query)
results = response['results']
while response['has_more_results'] and 'position' in response:
response = dt_api.iris_investigate(search_hash=query, position=response['position'])
results.extend(response['results'])
print(results)