tatuylonen/wiktextract

raw_glosses and glosses fields

Closed this issue · 2 comments

Here is part of the French extract for the word "livrer"

    "raw_glosses": [
      "(reflexive) abandon oneself, give oneself over [with à ‘to’]"
    ],
    "glosses": [
      "abandon oneself, give oneself over [with à ‘to’]",
      "abandon oneself, give oneself over"
    ],
    "tags": [
      "reflexive"
    ],

There are two overlapping glosses produced from one raw gloss. Presumably the reason for that is that it is not known how to deal with the text in [...] so that the user is offered the choice, keep or discard the [...]. Is there some logic by which the user can deal with this case automatically, perhaps always taking the first alternative along with the tags fields, so as not to lose or dupicate information?

The two lines in glosses are meant to be used for when you have a situation like this:

# To a reversed order; half round; facing in the opposite direction; from a contrary point of view. {{defdate|from ca. 1350–1470<ref name=SOED/>}}
...
## {{lb|en|nautical}} To the opposite [[tack]]: see {{l|en|go about}}. {{defdate|from late 15th c.<ref name=SOED/>}}

Where you've got a hierarchical gloss.

      "glosses": [
        "To a reversed order; half round; facing in the opposite direction; from a contrary point of view.",
        "To the opposite tack: see go about."
      ],

The code is a big spaghettuous, so I can't make any guarantees other than intent.

In this case, I would say it's a bug. The raw_gloss text is added directly because we wanted a higher-level gloss, but when the subglosses are cleaned up (which the high-level gloss apparently isn't) then when the gloss is added it doesn't notice it's "already there" when we do a gloss in data["glosses"] check. This doesn't seem to have anything to do with the recent addition of handling for certain templates (which means, handling of Template:+obj), it's just that +obj outputs something the we remove in the gloss (the final [brackets].

I'll see what I can do.

This should be fixed, if you find any oddities later just make a new issue (or if it's specifically this just comment here and I'll reopen).