goodmami/toolbox

Return character spans with alignments

Opened this issue · 0 comments

Since a regular expression is used to find tokens, and this regex has start and end indices, return those with the aligned tokens so they are available for users of the function.

E.g. instead of returning something like:

[('\t', [('dog-s', ['dog', '-', 's'])])]

Return something like:

[('\t', [('dog-s', ['dog', '-' , 's'], [(0, 3), (3, 4), (4, 5)])])]