For punctuation and white space - replace with a character rather than empty string
Closed this issue · 4 comments
goneall commented
If you remove these characters entirely, two words could be joined creating a false positive match. For example me rest
would match merest
which is a different word. Using a special character as a white space replacement would solve the issue. don't forget to replace any occurrence of 2 white spaces with one as well.
anshuldutt21 commented
@goneall Okay I will change that.
anshuldutt21 commented
@goneall Done the changes. Please review whenever you are free.
goneall commented
Almost - there is still a problem if there are multiple whitespace characters. For example, this[5 spaces]is
should normalize to this*is
rather than this*****is
.
You could use a regular expression replacing \s+
with the special character.
anshuldutt21 commented
Oh okay didn't see that. Did the change.