Platform specific split string?
Closed this issue · 4 comments
If I run the example from the readme I see different output:
Is this expected? I noticed you split by \r\n
which is windows-specific I think? Did you test the package on non-windows machines?
This is not expected and not an issue I was aware of. I do not have a non-windows machine currently to run some local tests like this on.
When I wrote this portion of the code I was not aware of the tokenizers package. It would likely be more robust to use their tokenize_lines
function to perform this action.
I will also brainstorm a unit test that would catch this behavior as well. I'm open to ideas for a good way to test this.
Yes you should add a unit test for this. You can automatically run checks on linux and osx using travis.
OK thanks. BTW you could reduce your dependency weight by calling stringi::stri_split_lines()
and stringi::stri_split_boundaries(x, type = "word")
directly rather than via tokenizers.