Return character spans with alignments
Opened this issue · 0 comments
goodmami commented
Since a regular expression is used to find tokens, and this regex has start and end indices, return those with the aligned tokens so they are available for users of the function.
E.g. instead of returning something like:
[('\t', [('dog-s', ['dog', '-', 's'])])]
Return something like:
[('\t', [('dog-s', ['dog', '-' , 's'], [(0, 3), (3, 4), (4, 5)])])]