
The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter.

Closed this issue · 0 comments

The TextPreProcessor class only supports word segmenting if hashtag symbol is there otherwise it fails.


# With hashtag it works
s = " question kind infidelity passed sweety not feel sweet #savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
'question kind infidelity passed sweety not feel sweet <hashtag> saving your marriage before it starts </hashtag>'

#without hashtag it fails
s = " question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
" question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"

The TextPreProcessor class configuration is similar to what is defined in file.

Kindly review it and if you find that correct, I can send a pull request.