Performing CRAN CHECK
Closed this issue · 10 comments
I've downloaded source code of the master branch and performed CRAN CHECK and it looks like there is a serious warning with tests. Is it only the case of my local machine and not configured Java or this package has never passed a CRAN CHECK before? Here are results of a CRAN CHECK on my local machine
==> devtools::check(document = FALSE)
Setting env vars ---------------------------------------------------------------
CFLAGS : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building dplyr.spark.hive ------------------------------------------------------
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD \
build '/home/mkosinski/dplyr.spark.hive/pkg' --no-resave-data --no-manual
* checking for file ‘/home/mkosinski/dplyr.spark.hive/pkg/DESCRIPTION’ ... OK
* preparing ‘dplyr.spark.hive’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building ‘dplyr.spark.hive_0.5.0.tar.gz’
Setting env vars ---------------------------------------------------------------
_R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_ : FALSE
Checking dplyr.spark.hive ------------------------------------------------------
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD \
check '/tmp/RtmpQr8pYx/dplyr.spark.hive_0.5.0.tar.gz' --as-cran --timings
* using log directory ‘/home/mkosinski/dplyr.spark.hive/dplyr.spark.hive.Rcheck’
* using R version 3.2.2 (2015-08-14)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘dplyr.spark.hive/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘dplyr.spark.hive’ version ‘0.5.0’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘dplyr.spark.hive’ can be installed ... WARNING
Found the following significant warnings:
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
Warning: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
See ‘/home/mkosinski/dplyr.spark.hive/dplyr.spark.hive.Rcheck/00install.out’ for details.
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
‘dplyr:::auto_copy’ ‘dplyr:::build_query’ ‘dplyr:::collect.tbl_sql’
‘dplyr:::common_by’ ‘dplyr:::copy_to.src_sql’
‘dplyr:::db_save_query.DBIConnection’ ‘dplyr:::over’
‘dplyr:::partition_group’ ‘dplyr:::sql_vector’
‘dplyr:::update.tbl_sql’ ‘dplyr:::uses_window_fun’
See the note in ?`:::` about the use of this operator.
package 'methods' is used but not declared
* checking S3 generic/method consistency ... WARNING
Warning: declared S3 method 'intersect.tbl_HS2' not found
Warning: declared S3 method 'union.tbl_HS2' not found
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... WARNING
Objects in \usage without \alias in documentation object 'load_to':
‘load_to.src_Hive’ ‘load_to.src_SparkSQL’
Objects in \usage without \alias in documentation object 'tbls':
‘tbls.src_sql’
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
Running ‘databases.R’ ERROR
Running the tests in ‘tests/databases.R’ failed.
Last 13 lines of output:
Warning: changing locked binding for 'over' in 'dplyr' whilst loading 'dplyr.spark.hive'
Warning: changing locked binding for 'partial_eval' in 'dplyr' whilst loading 'dplyr.spark.hive'
Warning: changing locked binding for 'default_op' in 'dplyr' whilst loading 'dplyr.spark.hive'
Warning messages:
1: replacing previous import by 'purrr::order_by' when loading 'dplyr.spark.hive'
2: replacing previous import by 'purrr::%>%' when loading 'dplyr.spark.hive'
>
> copy_to_from_local = dplyr.spark.hive:::copy_to_from_local
>
> my_db = src_SparkSQL()
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Calls: src_SparkSQL -> src_HS2 -> JDBC -> is.jnull -> .jfindClass
Execution halted
* checking PDF version of manual ...
OK
* DONE
Status: 1 ERROR, 3 WARNINGs, 1 NOTE
See
‘/home/mkosinski/dplyr.spark.hive/dplyr.spark.hive.Rcheck/00check.log’
for details.
Error: Command failed (1)
Execution halted
Exited with status 1.
Session Info (packages version)
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] quickcheck_3.5.1 nycflights13_0.1 ggplot2_2.0.0
[4] Lahman_4.0-1 lazyeval_0.1.10 purrr_0.2.0
[7] dplyr_0.4.3 RJDBC_0.2-5 rJava_0.9-6
[10] DBI_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 magrittr_1.5 MASS_7.3-45
[4] munsell_0.4.2 colorspace_1.2-6 R6_2.1.1
[7] stringr_1.0.0 plyr_1.8.3 tools_3.2.2
[10] parallel_3.2.2 functional_0.6 grid_3.2.2
[13] gtable_0.1.2 htmltools_0.3 yaml_2.1.13
[16] assertthat_0.1 digest_0.6.8 crayon_1.3.1
[19] pryr_0.1.2 codetools_0.2-14 bitops_1.0-6
[22] testthat_0.11.0 memoise_0.2.1 rmarkdown_0.9
[25] stringi_1.0-1 scales_0.3.0
I don't use devtools but you have HADOOP_JAR unset, most likely. I run R CMD check and it passes. The warnings unfortunately are a consequence of monkey patching, the alternative being a fork of dplyr, or Hadley acting on the issues I submit, both unlikely events. It was not easy to make this backend work and I had to employ some "advanced techniques" -- aka hacks. The missing doc entries warnings are a choice on my part not to expose methods to the end user, I think that's a bug in R CMD check, but it may be fixed in my next life, so I just live with the warnings.
What If I need to pass more than one .jar
to HADOOP_JAR
?
I think I need to specify such .jar
s in the classPath
in the JDBC
function in that line https://github.com/piccolbo/dplyr.spark.hive/blob/master/pkg/R/src-HS2.R#L38 :
JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
#"/opt/hive/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"))
Ok then HADOOP_JAR
should be assigned as
Sys.setenv(HADOOP_JAR = paste0(classPath, collapse=.Platform$path.sep)
since JDBC splits classPath
like this:
classPath <- path.expand(unlist(strsplit(classPath, .Platform$path.sep)))
Is this a theory or did the package clear the checks with this setting?
R CMD CHECK can't be performed in my case, since I can not pass user authentication in the current implementation of src_H2S
as described here #18
R CMD CHECK runs a standalone spark instance and doesn't require any authorization. From my review of your changes, authentication params are optional, so I don't understand your explanation. If you are correct, we may need to change things a bit. By the way, dev failed on unrelated issues. Better merge from master first. I will let you know shortly.
It took me a while to figure out what should be specified in HADOOP_JAR
and how I should pass port
and host
to create src_SparkSQL()
classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
"/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
"/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar",
"/opt/spark-1.5.2-bin-hadoop2.4/lib/spark-assembly-1.5.2-hadoop2.4.0.jar",
"/usr/share/java/slf4j/log4j-over-slf4j.jar",
"/opt/wpusers/r-wpitula/hadoop-conf/log4j.properties")
Sys.setenv(HADOOP_JAR = paste0(classPath, collapse=.Platform$path.sep))
for src_Hive()
I only use 3 first elemens of classPath
vector. Then after setting such environment variables
Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
Sys.setenv(HIVE_SERVER2_THRIFT_PORT = "10000/loghost;auth=noSasl")
Sys.setenv(SPARK_HOME = "/opt/spark-1.5.2-bin-hadoop2.4/")
I've managed to create src_SparkSQL()
and perform a simple select statement
(the same with src_Hive()
). I am encountering an issue with tests in tests
directory for which CRAN CHECK throws an error for first example
==> devtools::check(document = FALSE)
Setting env vars ---------------------------------------------------------------
CFLAGS : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building dplyr.spark.hive ------------------------------------------------------
'/usr/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD \
build '/var/wpusers/mkosinski/dplyr.spark.hive/pkg' --no-resave-data \
--no-manual
* checking for file ‘/var/wpusers/mkosinski/dplyr.spark.hive/pkg/DESCRIPTION’ ... OK
* preparing ‘dplyr.spark.hive’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building ‘dplyr.spark.hive_0.5.0.tar.gz’
Setting env vars ---------------------------------------------------------------
_R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_ : FALSE
Checking dplyr.spark.hive ------------------------------------------------------
'/usr/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD \
check '/tmp/RtmprIc8WK/dplyr.spark.hive_0.5.0.tar.gz' --as-cran --timings
* using log directory ‘/var/wpusers/mkosinski/dplyr.spark.hive/dplyr.spark.hive.Rcheck’
* using R version 3.1.3 (2015-03-09)
* using platform: x86_64-redhat-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘dplyr.spark.hive/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘dplyr.spark.hive’ version ‘0.5.0’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘dplyr.spark.hive’ can be installed ... WARNING
Found the following significant warnings:
Warning: class "JDBCConnection" is defined (with package slot ‘RJDBC’) but no metadata object found to revise subclass information---not exported? Making a copy in package ‘dplyr.spark.hive’
Warning: class "DBIConnection" is defined (with package slot ‘DBI’) but no metadata object found to revise subclass information---not exported? Making a copy in package ‘dplyr.spark.hive’
Warning: class "DBIObject" is defined (with package slot ‘DBI’) but no metadata object found to revise subclass information---not exported? Making a copy in package ‘dplyr.spark.hive’
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
Warning: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
See ‘/var/wpusers/mkosinski/dplyr.spark.hive/dplyr.spark.hive.Rcheck/00install.out’ for details.
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
‘dplyr:::auto_copy’ ‘dplyr:::build_query’ ‘dplyr:::collect.tbl_sql’
‘dplyr:::common_by’ ‘dplyr:::copy_to.src_sql’
‘dplyr:::db_save_query.DBIConnection’ ‘dplyr:::over’
‘dplyr:::partition_group’ ‘dplyr:::sql_vector’
‘dplyr:::update.tbl_sql’ ‘dplyr:::uses_window_fun’
See the note in ?`:::` about the use of this operator.
package 'methods' is used but not declared
* checking S3 generic/method consistency ... WARNING
Warning: declared S3 method 'intersect.tbl_HS2' not found
Warning: declared S3 method 'union.tbl_HS2' not found
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... WARNING
Objects in \usage without \alias in documentation object 'load_to':
‘load_to.src_Hive’ ‘load_to.src_SparkSQL’
Objects in \usage without \alias in documentation object 'tbls':
‘tbls.src_sql’
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... OK
* checking for unstated dependencies in tests ... OK
* checking tests ...
Running ‘databases.R’ ERROR
Running the tests in ‘tests/databases.R’ failed.
Last 13 lines of output:
+ tbl(my_db, "flights")
+ else{
+ copy_to_from_local(my_db, flights, "flights")}}
> flights
Source: Spark at:tools-1.hadoop.srv:10000/loghost;auth=noSasl
From: flights [0 x 16]
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for CREATE TABLE `ebkxyszehg` AS SELECT `year`, `month`, `day`, `dep_time`, `dep_delay`, `arr_time`, `arr_delay`, `carrier`, `tailnum`, `flight`, `origin`, `dest`, `air_time`, `distance`, `hour`, `minute`
FROM `flights`
LIMIT 0 (The query did not generate a result set!)
Calls: print ... dbSendQuery -> dbSendQuery -> .local -> .verify.JDBC.result
Execution halted
[3s/45s]
Error: Command failed (1)
Execution halted
Exited with status 1.
But this might be caused that I perform CRAN CHECK on dev
, instead of master
branch.
You are right dev may be in some odd state sometime, but it checks clean for me, after I merged from dev. You may need to pull from dev once more.
to be continued on rzilla fork