dronefly-garden/dronefly

taxon: search more than the first 30 results for phrase matches

synrg opened this issue · 1 comments

synrg commented

Problem:

My test case for this is ,taxon "guan". Because iNaturalist itself does not provide exact (phrase) matching itself, we use post-filtering the result set to only match results that have exactly the phrase typed in double-quotes. If a taxon that has the exact word "guan" in it is found within the first 30 results, it is shown. But what actually happens is "No exact match" is reported, and the user is left scratching their head, because they are sure that there is at least one taxon with "Guan" in the name! It just so happens that there are several results that do have exactly "guan" in the name, but they are not evident until you get the next 30 matches.

Analysis:

I previously thought that searching more than the first 30 results (the upper limit of per_page parameter with /v1/taxa/autocomplete endpoint) was fairly pointless because then you're getting into really obscure things that are less likely to be relevant. However, any time post-filtering of the results is done, this can result in "No exact match" even though the "obvious" best match hasn't been found yet.

At the very least, the "No exact match" message leaves a lot unexplained. What it actually should say here is that an exact match couldn't be found with reasonable effort. That said, I don't think only searching through one page of results is a reasonable effort! It's a very poor effort indeed, considering that some other commands can do up to 11 api calls in a row (like the ,me command). We could do far better here, even with the relatively cheap expenditure of 4 api calls in a row, especially now that aiothrottler takes the sting out of it (i.e. that will not cost us 4 seconds to do, most of the time; assuming enough capacity is available in the throttler, it will barely take longer than one call!)

Proposed fix:

Therefore, I propose we do two things:

  1. Fix the "No exact match" message to state what's really going on, and what the user can do about it if they still don't find a match (probably should suggest they try ,search taxa <whatever-their-search-terms-were-except-without-double-quotes> and just page through the results manually until they find what they were looking for).
  2. Don't give up after one API call returning 30 results. If there was no match in the first 30, and the reason was the post-processor ruled out all the potential matches, then try again, up to four times total (i.e. up to a maximum of 120 results), before finally emitting the new, improved message.

Implementer notes:

  • As stated above, 4 is a fairly conservative value. It balances the cost of continuing to try more API calls (which diminishes capacity for other requests in our budget of up to 60 requests in a burst before they start to get throttled down to 1 request per second). It is a number picked off the top of my head, and could be increased if we find we're still not finding a match when we ought to be.
  • The code should also be reviewed to see if there are other post-processing cases that might perform badly. Please list them here in a comment on this issue if there are any others, as it might influence the exact value for max # of retries and/or the text of the new message.
synrg commented

Fixed in 80345e6