dariusk/pos-js

Punctuation tags listed are not detected in lexer

markbirbeck opened this issue · 0 comments

When tagging, punctuation such as '(' and '$' should get their own tags. Some of them do, but it depends on whether the punctuation character in question has been added to the punctuation list in the lexer.

If the character hasn't been added then it depends on other matches whether the character will make it through. For example, neither of these will get special tokens:

"I made $some today..."
"The E.M.T.'s were on time (but only barely)."

However, the dollar sign will show up in this situation:

"I made $5.42 today..."

because the number gets parsed out, leaving the dollar to stand on its own.

The problem arose for me with parentheses, but on investigation I realised that other punctuation characters were also have an issue.