dronefly-garden/dronefly

taxon: handle "of" as non-keyword in middle of phrase

synrg opened this issue · 3 comments

synrg commented

The query ,t chicken of the woods unexpectedly returns Verbascum thapsus (wooly mullein) (Charmin of the woods). Apparently, of is treated as a keyword. However, in this context, where the words preceding "of" aren't part of another option, it makes no sense to treat it as a keyword. Therefore, the natural language query parser should treat it as a non-keyword instead so that the expected result, Laetiporus sulphureus (chicken of the woods) will be found instead.

This could be achieved in two passes:

  • pass 1: substitute all keywords except "of" with the unix-like option instead (--by, --from, etc.)
  • pass 2: if after pass 1, there are words preceding the first option, then:
    • add the implicit --of before them, and make no of -> --of substitutions
    • otherwise, scan the whole query for any "of". if there is a match, substitute only the first one

Expected outputs for example queries following the above steps:

,t chicken -> ,t chicken
,obs of chicken -> ,obs --of chicken
,t chicken of the woods -> ,t --of chicken of the woods
,obs by me of chicken -> ,obs --by me --of chicken
,obs by me of chicken of the woods -> ,obs --by me --of chicken of the woods

synrg commented

Thanks, @Riviera, for drawing this to my attention on iNat Discord.

synrg commented

It would be cleaner to handle this left-to-right in one pass, i.e.

  • scan and expand (or collect, and expand at the end) all tokens left to right
  • if a non-option, non-macro keyword is encountered, --of is immediately inserted into the expanded token list
  • after an --of is either inserted by this method or is explicitly encountered later in the argument list, no further occurrences of the token of will be transformed into --of, i.e. it will just be treated as the ordinary word of
synrg commented

Fixed by f5a10b3