dronefly-garden/dronefly

taxon: provide better hinting for "best match" for common name

synrg opened this issue · 2 comments

synrg commented

Problem: the user meant to match a particular common name and the bot selected a different one

Two examples brought up on Discord:

,t kagu matches an Estonian common name for a bird with more observations than the desired match:

image

The correct result can be brought up with the ,t lang subcommand, but almost nobody knows about this:

image

In fact, there's a second problem here, which is that "kagu" is not the preferred common name for the genus. "kagus" is. This, too, can be provided for using an rank sp filter, but again, most users don't remember that this can be done.

image

,t redshank matches a non-preferred common name:

image

If the bird was meant, the user needs to know that in birds must be specified to filter:

image

Subproblems:

There is a whole grab-bag full of different issues here:

  • non-preferred name is always considered "best" by iNat (i.e. try the same searches on the iNat website and you'll get the same "top" result for "kagu", as well as the same "top" result for "redshank")
  • non-english common names are not filtered out, and Dronefly provides no hint that you can even do that with the subcommand
  • Dronefly doesn't give you a chance to select the second-best name if the "best" was not the one you meant

Possible approaches:

  • a user setting and/or server setting for language e.g. ,user set lang en and/or ,inat set lang en that makes it ignore any non-English names (i.e. changes ,t behaviour to be like ,t lang en
    • to override for one command when they have a default set, the user would need to type, e.g. ,t lang any kagu
  • always prioritize preferred common name matches over other matches
    • though this may not be always the "best" choice for the user. it seems ok for these cases, but what if the user really did want spotted lady's thumb? they'd be confused, and even more confused by the fact that the iNat website gives them the "correct" result, from their perspective, whereas Dronefly gives them the "incorrect" result
    • therefore, this is not an approach that i particular like
  • give better feedback when there are other possible matches
    • if a common name has other possible matches, summarize what they are
      • at least count them
      • possibly also group the alternatives by the iconic taxa or else lowest rank at which the choices diverge, e.g.

There are 4 other matches for redshank, 2 in Aves (Birds), and 2 in Plantae (Plants).
Use ,s taxa redshank to search for other matching names.
Try in to match redshank in another taxon, e.g. ,t redshank in aves

  • provide a command to restrict matches to the preferred common name for the user's home place, e.g.
    • ,t preferred redshank
    • and if the top non-preferred name is matched by ,t redshank, then show a tip about using ,t preferred redshank to only match preferred names

Each one of these possible solutions should have its own issue. None of them are mutually exclusive, and each covers a different aspect of the overall problem.

synrg commented

Note that #164 partially implements ,user set lang en, but it does not prevent non-English common names from matching.

Of all of the possible approaches I listed above, I like keeping the matched name as close as possible to how it works on the web, but also provide an easy way to select an alternative if the top match isn't what was desired. That, in effect, is what the autocomplete on the website accomplishes: the top match is usually, but not always, what the user wanted. Therefore, the rest of the matches are shown in case they wanted a different match.

synrg commented

Here's another case, but it's not quite the same as the cases above. If you search for rhododendrons, the matched_term is rhododendronsläktet which very likely is not what you'd like to see:

image

See https://api.inaturalist.org/v1/taxa/autocomplete?q=rhododendrons

If you look at how it works on the web, it's not showing matched term here, but instead shows the preferred common name:

image