optional disjointed lexicons referenced in pattern

Question

optional disjointed lexicons referenced in pattern

jonorthwash opened this issue 3 years ago · 13 comments

Currently to indicate that a set of suffixational morphology is optional in a pattern, something like this would be a normal approach:

Roots [<pos>:] OptionalSuffixes?

To indicate that a set of prefixational morphology is optional in a pattern, something like this is needed:

Roots [<pos>:]
:OptionalPrefixes Roots [<pos>:] OptionalPrefixes:

The latter approach is generally tedious, but can be used for suffixational morphology. The former approach (which is more efficient from a coding perspective, and in complex cases is also much simpler) cannot be extended to prefixational morphology, or any other matched lexicon references.

I can (kind of?) imagine cases where the current behaviour (making each element combinatorially optional when they both have ?—i.e., 0 or X or Y or (X and Y)) could make sense, but it seems far more common to want them to operate together (i.e., 0 or (X and Y)). Or perhaps an additional symbol could be defined for this use?

Answer 1 · 2022-03-05T14:21:22.000Z

Under the current setup, the best way to do this is

PATTERN PosStem
Roots [<pos>:]

PATTERN Pos
PosStem
:OptionalPrefixes PosStem OptionalPrefixes:

I'm having trouble coming up with reasonable syntax that would make that a single line, but if you have an idea for one, I'm open to implementing it.

Answer 2 · 2022-03-05T14:44:50.000Z

Daniel Swanson - ***@***.*** wrote:

Under the current setup, the best way to do this is ``` PATTERN PosStem Roots [<pos>:] PATTERN Pos PosStem :OptionalPrefixes PosStem OptionalPrefixes: ```

In this case, I think the tag should just go where the prefix is. This idea that we need to artificially move all tags to the end is silly and hurts glossing (where you attempt to match tags with morphemes), and if you are really dedicated to it, you should be forced to use twoc.

I'm having trouble coming up with reasonable syntax that would make that a single line, but if you have an idea for one, I'm open to implementing it.

I think we could use a slotted lexicon, adding an optional ? before the slot: ``` LEXICON Circumfix(2) :pre <circ>:suff PATTERN Morphology Circumfix?(1) Verb Circumfix?(2) ``` It would be forbidden to have `Circumfix?(_)` and `Circumfix(_)` in the same line; you would have to use an alias if you wanted one to be conditional and the other not. Thoughts?

Answer 3 · 2022-03-05T14:58:55.000Z

The exact reason I created lexd was so that no one would ever have to write twoc again.

Yes, I agree that if all you want is the fst, then prefix tags are perfectly reasonable, but currently all Apertium tools assume suffix tags and I have no intention of being the one who rewrites everything for that. (I am willing to help make something that rearranges tags once between morph and disam, however.)

As for the particular suggestion, I like the idea, but it seems like it would introduce some ambiguity in parsing and I'm not sure how easy that would be to deal with. Like, the fact that 3?(3) would then have totally different behavior from 3? (3) bothers me. (And yes, completely numeric lexicon names are currently valid.)

Answer 4 · 2022-03-05T16:28:42.000Z

Upon further reflection, 3(3) and 3 (3) are already different things, so actually I think @nlhowell's suggestion works.

So if all references to a particular lexicon in a pattern are [name]?([number]), two copies of the pattern will be compiled, one with all of them present and one with all of them absent.

If some are optional and some aren't, I don't think it would actually break anything, but it would probably be confusing, so yeah, probably best to require aliasing in that case.

Answer 5 · 2022-03-06T13:34:04.000Z

If some are optional and some aren't [...] probably best to require aliasing in that case.

What's aliasing?

Answer 6 · 2022-03-06T21:39:53.000Z

There's an ALIAS command the allows you to give a lexicon a second name, which is useful if for some reason you want independent copies of a lexicon in a single pattern.

LEXICON A
x
y

ALIAS A B

PATTERNS
A A # xx, yy
A B # xx, xy, yx, yy

Answer 7 · 2022-03-08T00:39:09.000Z

So if all references to a particular lexicon in a pattern are [name]?([number]), two copies of the pattern will be compiled, one with all of them present and one with all of them absent.

Actually even simpler than that. The references to that lexicon can just have a temporary empty entry.

Answer 8 · 2022-04-09T18:41:32.000Z

Ooh, nice. I see updates to code and tests, but not the documentation?

Answer 9 · 2022-04-09T18:42:33.000Z

I forgot and added it in a separate commit

Answer 10 · 2022-06-14T17:17:58.000Z

Optional lexicons where sides are matched are not acting as expected:

PATTERNS
A:? B :A?

LEXICON A
a:<a>

LEXICON B
b
bb

Output:

ab:b
abb:bb
abb:bb<a>
ab:b<a>
b
bb
bb:bb<a>
b:b<a>

Expected output:

abb:bb<a>
ab:b<a>
b
bb

Answer 11 · 2022-06-14T19:02:17.000Z

For a question mark to be interpreted as disjointed, it needs to be before parentheses and not the last character, so you need to write A?(1): B :A?(1). With that change it works fine.

Answer 12 · 2022-07-27T15:00:14.000Z

PATTERNS
A:? B :A?
For a question mark to be interpreted as disjointed, it needs to be before parentheses and not the last character, so you need to write A?(1): B :A?(1). With that change it works fine.

We just tried this again today and had to go find this issue after consulting the documentation and remaining confused about why this wasn't working. Could the documentation be updated with an example?

Answer 13 · 2022-07-28T03:50:03.000Z

PATTERNS
A:? B :A?
For a question mark to be interpreted as disjointed, it needs to be before parentheses and not the last character, so you need to write A?(1): B :A?(1). With that change it works fine.
We just tried this again today and had to go find this issue after consulting the documentation and remaining confused about why this wasn't working. Could the documentation be updated with an example?

Done in 3950b6f