Encoding issues in sys.source()
Closed this issue · 16 comments
I've got error when I run testthat in Windows.
-
checking for unstated dependencies in examples ... OK
-
checking examples ... NONE
-
checking for unstated dependencies in tests ... OK
-
checking tests ...
Running 'test-all.R'
ERROR
Running the tests in 'tests/test-all.R' failed.
Last 13 lines of output:
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.library(testthat)
library(KoNLP)
Loading required package: rJava
Loading required package: bitops
test_package("KoNLP")
Error in parse(n = -1, file = file) :
invalid multibyte character in parser at line 5
Calls: test_package ... with_reporter -> force -> lapply -> FUN -> sys.source
-> parse
Execution halted
I was using UTF-8 test case source to check Korean NLP functions, Linux(UTF-8) or Mac was ok. but Windows was not.
I think, Usages of sys.source in testthat need to be sys.source(file(",,,,,", encoding=""), env).
Freely check with my package on "https://github.com/haven-jeon/KoNLP".
Hmmm, I wonder if the encoding should be extract from DESCRIPTION in the case of test_package
?
Good idea...but many of packages doesn't express file encoding of packages on DESCRIPTION. In those case, R derive from locale to assume pkg encodings.
Don't you have to declare the correct encoding in order to pass R CMD check?
if I don't use non-ASCII character on package sources, R CMD check can be passed. But in my case, I need to specify encoding, because I use non-ASCII character like Korean on sources.
I think, Encoding on DESCRIPTION is not mandatory rule.
How about to use source() instead of sys.source()?
I need to use sys.source
because source
doesn't give enough control over the environment in which the code is executed.
Are you still having this problem? I just tried to re-create it with your package and couldn't reproduce your error.
Still have a problems on encodings.
haven@HAVEN-PC-HAVEN /d
$ R CMD check KoNLP_0.0-9.2.tar.gz
-
using log directory 'd://KoNLP.Rcheck'
-
using R version 2.14.1 (2011-12-22)
-
using platform: i386-pc-mingw32 (32-bit)
-
using session charset: CP949
-
checking for file 'KoNLP/DESCRIPTION' ... OK
-
this is package 'KoNLP' version '0.0-9.2'
-
checking package namespace information ... OK
-
checking package dependencies ... OK
-
checking if this is a source package ... OK
-
checking if there is a namespace ... OK
-
checking for executable files ... OK
-
checking whether package 'KoNLP' can be installed ... OK
-
checking installed package size ... NOTE
installed size is 6.8Mb
sub-directories of 1Mb or more:
dics 4.7Mb
java 1.8Mb -
checking package directory ... OK
-
checking for portable file names ... OK
-
checking DESCRIPTION meta-information ... OK
-
checking top-level files ... OK
-
checking index information ... OK
-
checking package subdirectories ... OK
-
checking R files for non-ASCII characters ... OK
-
checking R files for syntax errors ... OK
-
checking whether the package can be loaded ... OK
-
checking whether the package can be loaded with stated dependencies ... OK
-
checking whether the package can be unloaded cleanly ... OK
-
checking whether the namespace can be loaded with stated dependencies ... OK
-
checking whether the namespace can be unloaded cleanly ... OK
-
checking for unstated dependencies in R code ... OK
-
checking S3 generic/method consistency ... OK
-
checking replacement functions ... OK
-
checking foreign function calls ... OK
-
checking R code for possible problems ... OK
-
checking Rd files ... OK
-
checking Rd metadata ... OK
-
checking Rd cross-references ... OK
-
checking for missing documentation entries ... OK
-
checking for code/documentation mismatches ... OK
-
checking Rd \usage sections ... OK
-
checking Rd contents ... OK
-
checking for unstated dependencies in examples ... OK
-
checking R/sysdata.rda ... OK
-
checking examples ... NONE
-
checking for unstated dependencies in tests ... OK
-
checking tests ...
Running 'test-all.R'
ERROR
Running the tests in 'tests/test-all.R' failed.
Last 13 lines of output:
Type 'q()' to quit R.library(testthat)
library(KoNLP)
Loading required package: rJava
Loading required package: bitops
Checking user defined dictionary!test_package("KoNLP")
Error in parse(n = -1, file = file) :
invalid multibyte character in parser at line 5
Calls: test_package ... with_reporter -> force -> lapply -> FUN -> sys.source
-> parse
Execution halted
testthat still can not recognize KoNLP package encoding.
Weird - R CMD check worked for me with the latest version of KoNLP. But it may be because I ran it through devtools::check()
which tries to better emulate what CRAN does. Could you try that too?
Actually, I extracted out test code before upload to CRAN. You need to test with github KoNLP pkg to run unit test.
I did use the github version.
You probably had run on "UTF-8" locale. In my case, on Ubuntu and Mac OS runs ok("UTF-8").
Remain OS was Windows 7 Korean version and default text encoding in R was regarded as "CP949" for all character or file, if I didn't set encoding specifically.
localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
[1] "CP949"
The conflict was arisen with encoding of KoNLP test source files were "UTF-8".
I also set "Encoding : UTF-8" on DESCRIPTION. but test_that could not recognize that.
If there is a way to set test source file encoding on test_that functions, this situation will be resolved, I think.
I run R CMD check
with the C locale, as does CRAN. I think you are better off doing that, rather than trying to mess around with non-standard encodings.
I think, this case could be arisen all CJK characters(multibyte character). So,test cases can not be included when releasing CJK text mining pkges.
Can I fork to investigates or patching? I can give help on this package.
Sure, you'd be welcome to. Another useful contribution would a be a single file test case that I could easily incorporate into testthat.