Platform specific split string?

Question

Platform specific split string?

Closed this issue 6 years ago · 4 comments

If I run the example from the readme I see different output:

Is this expected? I noticed you split by \r\n which is windows-specific I think? Did you test the package on non-windows machines?

Answer 1 · 2018-04-09T18:38:17.000Z

This is not expected and not an issue I was aware of. I do not have a non-windows machine currently to run some local tests like this on.

When I wrote this portion of the code I was not aware of the tokenizers package. It would likely be more robust to use their tokenize_lines function to perform this action.

I will also brainstorm a unit test that would catch this behavior as well. I'm open to ideas for a good way to test this.

Answer 2 · 2018-04-09T18:51:14.000Z

Yes you should add a unit test for this. You can automatically run checks on linux and osx using travis.

Answer 3 · 2018-04-09T19:51:36.000Z

e8d8e90 should fix issue.

Added unit test here to test for literal "\n" characters in result text: test here. Open to other ways to test this behavior.

Answer 4 · 2018-04-10T17:16:25.000Z

OK thanks. BTW you could reduce your dependency weight by calling stringi::stri_split_lines() and stringi::stri_split_boundaries(x, type = "word") directly rather than via tokenizers.