spdx/yalm-python

For punctuation and white space - replace with a character rather than empty string

Closed this issue · 4 comments

If you remove these characters entirely, two words could be joined creating a false positive match. For example me rest would match merest which is a different word. Using a special character as a white space replacement would solve the issue. don't forget to replace any occurrence of 2 white spaces with one as well.

@goneall Okay I will change that.

@goneall Done the changes. Please review whenever you are free.

Almost - there is still a problem if there are multiple whitespace characters. For example, this[5 spaces]is should normalize to this*is rather than this*****is.

You could use a regular expression replacing \s+ with the special character.

Oh okay didn't see that. Did the change.