seperate_words function based on \W+ re instead?

Question

seperate_words function based on \W+ re instead?

Closed this issue 7 years ago · 5 comments

@fabianvf Is it my imagination or could we replace the entire separate_words function better with a \W+ or \W regex instead?

Answer 1 · 2017-08-16T08:42:56.000Z

@fabianvf bump

Answer 2 · 2017-08-24T07:28:25.000Z

@fabianvf thoughts on this?

Answer 3 · 2017-08-31T20:18:03.000Z

Honestly I'm not sure, does it make a difference? Wonder if there are any edge cases related to punctuation or something that will bite us.

Answer 4 · 2017-08-31T20:23:53.000Z

It'll be important for nonwestern languages.

…

On Aug 31, 2017 4:18 PM, "Fabian von Feilitzsch" ***@***.***> wrote: Honestly I'm not sure, does it make a difference? Wonder if there are any edge cases related to punctuation or something that will bite us. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AShd7CGb6Mvox5WWWtd2GjTIZhujfB7rks5sdxT8gaJpZM4OwesY> .

Answer 5 · 2017-09-11T22:19:36.000Z

@fabianvf Also doing it that way is faster, easier to understand and more maintainable/pythonic