The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter.

Question

The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter.

Closed this issue 5 years ago · 0 comments

The TextPreProcessor class only supports word segmenting if hashtag symbol is there otherwise it fails.

Example:-

# With hashtag it works
s = " question kind infidelity passed sweety not feel sweet #savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
'question kind infidelity passed sweety not feel sweet <hashtag> saving your marriage before it starts </hashtag>'

#without hashtag it fails
s = " question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
" question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"

The TextPreProcessor class configuration is similar to what is defined in README.md file.

Kindly review it and if you find that correct, I can send a pull request.