lebebr01/pdfsearch

Platform specific split string?

Closed this issue · 4 comments

If I run the example from the readme I see different output:

screen shot 2018-04-09 at 7 45 11 pm

Is this expected? I noticed you split by \r\n which is windows-specific I think? Did you test the package on non-windows machines?

This is not expected and not an issue I was aware of. I do not have a non-windows machine currently to run some local tests like this on.

When I wrote this portion of the code I was not aware of the tokenizers package. It would likely be more robust to use their tokenize_lines function to perform this action.

I will also brainstorm a unit test that would catch this behavior as well. I'm open to ideas for a good way to test this.

Yes you should add a unit test for this. You can automatically run checks on linux and osx using travis.

e8d8e90 should fix issue.

Added unit test here to test for literal "\n" characters in result text: test here. Open to other ways to test this behavior.

OK thanks. BTW you could reduce your dependency weight by calling stringi::stri_split_lines() and stringi::stri_split_boundaries(x, type = "word") directly rather than via tokenizers.