r-lib/testthat

Encoding issues in sys.source()

Closed this issue · 16 comments

I've got error when I run testthat in Windows.

  • checking for unstated dependencies in examples ... OK

  • checking examples ... NONE

  • checking for unstated dependencies in tests ... OK

  • checking tests ...
    Running 'test-all.R'
    ERROR
    Running the tests in 'tests/test-all.R' failed.
    Last 13 lines of output:
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.

    library(testthat)
    library(KoNLP)
    Loading required package: rJava
    Loading required package: bitops
    test_package("KoNLP")
    Error in parse(n = -1, file = file) :
    invalid multibyte character in parser at line 5
    Calls: test_package ... with_reporter -> force -> lapply -> FUN -> sys.source
    -> parse
    Execution halted

I was using UTF-8 test case source to check Korean NLP functions, Linux(UTF-8) or Mac was ok. but Windows was not.

I think, Usages of sys.source in testthat need to be sys.source(file(",,,,,", encoding=""), env).

Freely check with my package on "https://github.com/haven-jeon/KoNLP".

Hmmm, I wonder if the encoding should be extract from DESCRIPTION in the case of test_package?

Good idea...but many of packages doesn't express file encoding of packages on DESCRIPTION. In those case, R derive from locale to assume pkg encodings.

Don't you have to declare the correct encoding in order to pass R CMD check?

if I don't use non-ASCII character on package sources, R CMD check can be passed. But in my case, I need to specify encoding, because I use non-ASCII character like Korean on sources.

I think, Encoding on DESCRIPTION is not mandatory rule.

How about to use source() instead of sys.source()?

I need to use sys.source because source doesn't give enough control over the environment in which the code is executed.

Are you still having this problem? I just tried to re-create it with your package and couldn't reproduce your error.

Still have a problems on encodings.

haven@HAVEN-PC-HAVEN /d
$ R CMD check KoNLP_0.0-9.2.tar.gz

  • using log directory 'd://KoNLP.Rcheck'

  • using R version 2.14.1 (2011-12-22)

  • using platform: i386-pc-mingw32 (32-bit)

  • using session charset: CP949

  • checking for file 'KoNLP/DESCRIPTION' ... OK

  • this is package 'KoNLP' version '0.0-9.2'

  • checking package namespace information ... OK

  • checking package dependencies ... OK

  • checking if this is a source package ... OK

  • checking if there is a namespace ... OK

  • checking for executable files ... OK

  • checking whether package 'KoNLP' can be installed ... OK

  • checking installed package size ... NOTE
    installed size is 6.8Mb
    sub-directories of 1Mb or more:
    dics 4.7Mb
    java 1.8Mb

  • checking package directory ... OK

  • checking for portable file names ... OK

  • checking DESCRIPTION meta-information ... OK

  • checking top-level files ... OK

  • checking index information ... OK

  • checking package subdirectories ... OK

  • checking R files for non-ASCII characters ... OK

  • checking R files for syntax errors ... OK

  • checking whether the package can be loaded ... OK

  • checking whether the package can be loaded with stated dependencies ... OK

  • checking whether the package can be unloaded cleanly ... OK

  • checking whether the namespace can be loaded with stated dependencies ... OK

  • checking whether the namespace can be unloaded cleanly ... OK

  • checking for unstated dependencies in R code ... OK

  • checking S3 generic/method consistency ... OK

  • checking replacement functions ... OK

  • checking foreign function calls ... OK

  • checking R code for possible problems ... OK

  • checking Rd files ... OK

  • checking Rd metadata ... OK

  • checking Rd cross-references ... OK

  • checking for missing documentation entries ... OK

  • checking for code/documentation mismatches ... OK

  • checking Rd \usage sections ... OK

  • checking Rd contents ... OK

  • checking for unstated dependencies in examples ... OK

  • checking R/sysdata.rda ... OK

  • checking examples ... NONE

  • checking for unstated dependencies in tests ... OK

  • checking tests ...
    Running 'test-all.R'
    ERROR
    Running the tests in 'tests/test-all.R' failed.
    Last 13 lines of output:
    Type 'q()' to quit R.

    library(testthat)
    library(KoNLP)
    Loading required package: rJava
    Loading required package: bitops
    Checking user defined dictionary!

    test_package("KoNLP")
    Error in parse(n = -1, file = file) :
    invalid multibyte character in parser at line 5
    Calls: test_package ... with_reporter -> force -> lapply -> FUN -> sys.source
    -> parse
    Execution halted

testthat still can not recognize KoNLP package encoding.

Weird - R CMD check worked for me with the latest version of KoNLP. But it may be because I ran it through devtools::check() which tries to better emulate what CRAN does. Could you try that too?

Actually, I extracted out test code before upload to CRAN. You need to test with github KoNLP pkg to run unit test.

I did use the github version.

You probably had run on "UTF-8" locale. In my case, on Ubuntu and Mac OS runs ok("UTF-8").

Remain OS was Windows 7 Korean version and default text encoding in R was regarded as "CP949" for all character or file, if I didn't set encoding specifically.

localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
[1] "CP949"

The conflict was arisen with encoding of KoNLP test source files were "UTF-8".
I also set "Encoding : UTF-8" on DESCRIPTION. but test_that could not recognize that.

If there is a way to set test source file encoding on test_that functions, this situation will be resolved, I think.

I run R CMD check with the C locale, as does CRAN. I think you are better off doing that, rather than trying to mess around with non-standard encodings.

I think, this case could be arisen all CJK characters(multibyte character). So,test cases can not be included when releasing CJK text mining pkges.

Can I fork to investigates or patching? I can give help on this package.

Sure, you'd be welcome to. Another useful contribution would a be a single file test case that I could easily incorporate into testthat.