obs: make a successful match more likely by trying other matching taxa

Question

obs: make a successful match more likely by trying other matching taxa

synrg opened this issue 2 years ago · 0 comments

Related to #159, users who want to retrieve their latest observation by name might type a query that does not match the "best match" taxon for the keywords typed. In that case, it would be better to offer better ways to find what they want than to simply fail.

For example, if I try ,obs my milk, intending to match Irpex lacteus (Milk-white Toothed Polypore) I recently observed, and it fails because ,t milk would match Family Euphorbiaceae (spurge family) (Milkweed & rubber family instead. I'd get this error response instead of what I wanted:

The user experience could be improved if, on failing to find any matches, the query were retried as a &q=taxon_query&search_on=names instead and any matches either shown directly, or shown as a list of possible matches for the user to select one from:

For this example, /v1/observations?user_id=benarmstrong&q=milk&search_on=names would yield better results:

this is the equivalent web search: https://www.inaturalist.org/observations/?q=milk&search_on=names&user_id=benarmstrong
had I followed that link on or shortly after the day I actually made that observation (Feb. 4, 2019) it would have matched exactly the observation I wanted (i.e. the milk-white toothed polypore, not the milk thistle).

There are a couple of weaknesses to this approach:

Extra effort might be needed to provide a search for q= in conjunction with the in clause to disambiguate the query, e.g. the user might reasonably expect ,obs my milk in fungi to work to rule out the "milk thistle" matches.

Unfortunately, there's a good chance that might match my latest observation in "Family Russulaceae (Milkcaps, Brittlegills and Allies)" instead, as that is the "best" match for milk in fungi. Not what we wanted in this example, but in other scenarios might be exactly what we wanted, so let's not discard this out of hand ...

q= will only match if the name of the taxon itself matches the query, not any of its parent taxa (e.g. what if we had wanted to find our latest observation in "milkcaps, brittlegills and allies"?)

Another approach that might work is to take the "top taxon ids" returned by the previous taxon lookup for the query and string them together as a comma-delimited list for a taxon_id=#,#,#,... query for the observations search instead of our "one best match", e.g. /v1/taxa/autocomplete?q=milk&per_page=30 returns a list of 30 taxa to try, ranked by the API in most to least likely to be what the user wanted. The user might even want to use this in conjunction with in fungi to increase accuracy (which the API supports) ...

the first 30 of 1112 results, of which (luckily) Irpex lacteus is the 29th match:
- https://api.inaturalist.org/v1/taxa/autocomplete?q=milk&per_page=30
the first 30 of 95 results, of which Irpex lacteus is the 3rd match
- https://api.inaturalist.org/v1/taxa/autocomplete?q=milk&per_page=30&taxon_id=47170

An observation query could be performed as a 2nd try that matches the most recent of the user's observations in any of those 30 taxa.

Finally, it might be more fruitful search for the "best" taxon name(s) amongst all taxa on a user's life list (the new auto-generated one). However, we have no code to support new life lists yet, so this is a tougher thing to support. That would be a very powerful ting to add, though, and it might open the door to a variety of more user-aware queries that improve accuracy and/or give better insight to a user about their own data.

At this point, you might observe that milk is not a reasonable query and milk-white, or milk polypore might've been tried by the user from the outset, and you're not wrong! However, the principle here is that if our code can make reasonable guesses at what the user meant with a minimal extra expenditure of resources that are reasonably good at producing one or more potential matches, that's a better user experience than putting the burden on them to come up with a better query.