INL/BlackLab

SpansRepetition only finds repetitions from consecutive matches

Closed this issue · 1 comments

SpansRepetition seems to miss repetitions that could be constructed from non-consecutive matches (possible with variable-length clauses), because it only looks at consecutive ones. You would probably need to gather matches, sort them in two separate lists, one by end point (where each end point could correspond to multiple start points), the other by start point (again, each start point could correspond to multiple end points), then use those lists to find connecting matches (which could mean a start/end point actually produces 4 hits if the end point corresponds to 2 start points and the start point corresponds to two end points).

E.g. if we're trying to find a repetition {2} within these matches:

pos 5   6   7   8   9
A:  +---.---+
B:      +---+
C:          +---+
D:          +---.---+

The correct resulting matches should be:

pos 5   6   7   8   9
    +---.---.---+       (A followed by C)
    +---.---.---.---+   (A followed by D)
        +---.---+       (B followed by C)
        +---.---.---+   (B followed by D)

(currently it seems like only B followed by C would be found, because the other options are non-consecutive)

Fixed in unify-captures-relations branch.