r-rudra/tidycells

CRAN Issues Fix. CRAN archived.

Opened this issue · 19 comments

Got it archived. Devlopment cycle need to fix urgently.

Make an intermediate silent release.
Target May 14 noon.
Most of the things can be kept as it is.

  • DT plot option to be kept silent (all shiny modules to be in ggplot2 for this time)
  • exploration finding can be kept silent
  • if the initial deadline fail - try a full-scale release

I think at this point the only option is to perform quick fix in the main repo.

Taken a fork here for future reference.

Need to fix things in the main repo and release a patch.

I think last error was coming in

d1 <- read_cells(dm$fn[dm$original == "xls"])

Just checked it is mostly happening for dplyr 1.0.0
tidyverse/dplyr#5211

As suggested by Hadley

mydev::beta_lib_loc()
library(dplyr, warn.conflicts = F)
packageVersion("dplyr")
#> [1] '0.8.99.9002'
packageVersion("vctrs")
#> [1] '0.3.0'

dat <- iris %>% head()

# class(dat) <- c(class(dat),"test")
class(dat) <- c("test", class(dat))


iris %>% head() %>% select(Species, Sepal.Length)
#>   Species Sepal.Length
#> 1  setosa          5.1
#> 2  setosa          4.9
#> 3  setosa          4.7
#> 4  setosa          4.6
#> 5  setosa          5.0
#> 6  setosa          5.4
dat %>% select(Species, Sepal.Length)
#>   Species Sepal.Length
#> 1  setosa          5.1
#> 2  setosa          4.9
#> 3  setosa          4.7
#> 4  setosa          4.6
#> 5  setosa          5.0
#> 6  setosa          5.4

iris %>% vctrs::vec_assert()
dat %>% vctrs::vec_assert()

Created on 2020-05-11 by the reprex package (v0.3.0)

nacnudus/unpivotr#35

My bad all mails filtered in junk folder god knows how..

as_cell_df is quite slow and hoping a lot in S3 method dispatch.. need to fix it..

See #40

RHub Console Output 1
* using log directory 'C:/Users/USERCBPbIvzeHV/tidycells.Rcheck'
* using R Under development (unstable) (2020-04-22 r78281)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* using option '--as-cran'
* checking for file 'tidycells/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'tidycells' version '0.2.2.9000'
* package encoding: UTF-8
* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Indranil Gayen <nil.gayen@gmail.com>'

New submission

Package was archived on CRAN

Version contains large components (0.2.2.9000)

Found the following (possibly) invalid URLs:
  URL: https://cran.r-project.org/web/checks/check_results_tidycells.html
    From: README.md
    Status: 404
    Message: Not Found
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking serialization versions ... OK
* checking whether package 'tidycells' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking 'build' directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from 'inst/doc' ... OK
* checking files in 'vignettes' ... OK
* checking examples ... ERROR
Running examples in 'tidycells-Ex.R' failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: as_cell_df
> ### Title: Transform data into Cell-DF Structure
> ### Aliases: as_cell_df
> 
> ### ** Examples
> 
> 
> as_cell_df(iris)
A Cell Data Frame
= To see cell stats, call summary()
= To see cell structure, call plot()
= Content:
Error in loadNamespace(name) : there is no package called 'utf8'
Calls: <Anonymous> ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
* checking for unstated dependencies in 'tests' ... OK
* checking tests ... ERROR
  Running 'testthat.R' [123s]
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
   43. base::asNamespace(ns)
   44. base::getNamespace(ns)
   45. base::loadNamespace(name)
   46. base::withRestarts(stop(cond), retry_loadNamespace = function() NULL)
   47. base:::withOneRestart(expr, restarts[[1L]])
   48. base:::doWithOneRestart(return(expr), restart)
  
  == testthat results  ===========================================================
  [ OK: 151 | SKIPPED: 4 | WARNINGS: 0 | FAILED: 3 ]
  1. Error: numeric_values_classifier works (@test-VA_classifier.R#19) 
  2. Error: read_cells for NULL works (@test-read_cells.R#4) 
  3. Error: read_cells for external packages works (@test-read_cells.R#22) 
  
  Error: testthat unit tests failed
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in 'inst/doc' ... OK
* checking re-building of vignette outputs ... WARNING
Error(s) in re-building vignettes:
  ...
--- re-building 'tidycells-intro.Rmd' using rmarkdown
Quitting from lines 298-314 (tidycells-intro.Rmd) 
Error: processing vignette 'tidycells-intro.Rmd' failed with diagnostics:
there is no package called 'utf8'
--- failed re-building 'tidycells-intro.Rmd'

SUMMARY: processing the following file failed:
  'tidycells-intro.Rmd'

Error: Vignette re-building failed.
Execution halted

* checking PDF version of manual ... OK
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE
Status: 2 ERRORs, 1 WARNING, 1 NOTE
RHub Console Output 2
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> library(tidycells)
> 
> 
> test_check("tidycells")
── 1. Failure: read_cells: csv works (@test-read_cells.R#75)  ──────────────────
`dc0` not equal to `expct_d2`.
2/20 mismatches
x[1]: "12"
y[1]: "1.5"

x[2]: "1.5"
y[2]: "12"

Support present for following type of files: csv, xls, xlsx, docx, pdf, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✔      ✔        │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
Support present for following type of files: csv, xls, xlsx, docx, pdf, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✔      ✔        │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
�� testthat results  �����������������������������������������������������������
[ OK: 182 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
1. Failure: read_cells: csv works (@test-read_cells.R#75) 

Error: testthat unit tests failed
Execution halted
RHub Console Output 3
R Under development (unstable) (2020-05-07 r78381) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> library(tidycells)
> 
> 
> test_check("tidycells")
── 1. Error: etc works  ────────────────────────────────────────────────────────
Java Exception <no description because toString() failed>.jcall(cell, "Lorg/apache/poi/ss/usermodel/RichTextString;", "getRichStringCellValue")new("jobjRef", jobj = <pointer: 0x559c1b285030>, jclass = "java/lang/Throwable")
Backtrace:
  1. tidycells:::read_xls_from_xlsx(dex)
  6. tidycells:::read_xls_for_tidycells(fn)
 11. purrr::map(., for_a_sheet)
 16. tidycells:::.f(.x[[i]], ...)
 13. purrr::map(., xlsx::getCellValue)
 26. xlsx:::.f(.x[[i]], ...)
 28. rJava::.jcall(...)
 29. rJava::.jcheck(silent = FALSE)

── 2. Failure: optional_package dependency test (@test-optional_package.R#25)  ─
`d1` not equal to `d2`.
target is NULL, current is tbl_df

Support present for following type of files: csv, xls, xlsx, docx, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc, pdf
Note:
☰ These packages are required: tabulizer
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✖     ✖         │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
Error in .jcall("java/lang/Class", "Ljava/lang/Class;", "forName", cl,  : 
  RcallMethod: cannot determine object class
�� testthat results  �����������������������������������������������������������
[ OK: 171 | SKIPPED: 3 | WARNINGS: 0 | FAILED: 2 ]
1. Error: etc works 
2. Failure: optional_package dependency test (@test-optional_package.R#25) 

Error: testthat unit tests failed
Execution halted
RHub Console Output 4
R Under development (unstable) (2020-05-07 r78381) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> library(tidycells)
> 
> 
> test_check("tidycells")
── 1. Error: etc works  ────────────────────────────────────────────────────────
Java Exception <no description because toString() failed>.jcall(cell, "Lorg/apache/poi/ss/usermodel/RichTextString;", "getRichStringCellValue")new("jobjRef", jobj = <pointer: 0x559c1b285030>, jclass = "java/lang/Throwable")
Backtrace:
  1. tidycells:::read_xls_from_xlsx(dex)
  6. tidycells:::read_xls_for_tidycells(fn)
 11. purrr::map(., for_a_sheet)
 16. tidycells:::.f(.x[[i]], ...)
 13. purrr::map(., xlsx::getCellValue)
 26. xlsx:::.f(.x[[i]], ...)
 28. rJava::.jcall(...)
 29. rJava::.jcheck(silent = FALSE)

── 2. Failure: optional_package dependency test (@test-optional_package.R#25)  ─
`d1` not equal to `d2`.
target is NULL, current is tbl_df

Support present for following type of files: csv, xls, xlsx, docx, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc, pdf
Note:
☰ These packages are required: tabulizer
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✖     ✖         │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
Error in .jcall("java/lang/Class", "Ljava/lang/Class;", "forName", cl,  : 
  RcallMethod: cannot determine object class
�� testthat results  �����������������������������������������������������������
[ OK: 171 | SKIPPED: 3 | WARNINGS: 0 | FAILED: 2 ]
1. Error: etc works 
2. Failure: optional_package dependency test (@test-optional_package.R#25) 

Error: testthat unit tests failed
Execution halted 
AppVeyor Console Output
packages 'utf8', 'data.table', 'pingr' are available as source packages but not as binaries
Error: (converted from warning) packages 'utf8', 'data.table', 'pingr' are not available (as a binary package for R Under development)
Execution halted
Command exited with code 1
7z a failure.zip *.Rcheck\*
   packages 'utf8', 'data.table', 'pingr' are available as source packages but not as binaries
Error: (converted from warning) packages 'utf8', 'data.table', 'pingr' are not available (as a binary package for R Under development)
Execution halted
Command exited with code 1
7z a failure.zip *.Rcheck\*

This is CI issue I think is possible to fix (see this)

Travis Console Output
* installing *source* package ‘pkgbuild’ ...
** package ‘pkgbuild’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** byte-compile and prepare package for lazy loading
Error: .onLoad failed in loadNamespace() for 'processx', details:
  call: loadNamespace(name)
  error: there is no package called ‘ps’
Execution halted
ERROR: lazy loading failed for package ‘pkgbuild’
* removing ‘/Users/travis/R/Library/pkgbuild’
Error in i.p(...) : 
  (converted from warning) installation of package ‘pkgbuild’ had non-zero exit status
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
Execution halted
The command "Rscript -e 'deps <- remotes::dev_package_deps(dependencies = NA);remotes::install_deps(dependencies = TRUE);if (!all(deps$package %in% installed.packages())) { message("missing: ", paste(setdiff(deps$package, installed.packages()), collapse=", ")); q(status = 1, save = "no")}'" failed and exited with 1 during .

This is CI issue I think is possible to fix (see this)

RHub Console Output (vignette) 1
--- re-building ‘tidycells-intro.Rmd’ using rmarkdown
Quitting from lines 446-457 (tidycells-intro.Rmd) 
Error: processing vignette 'tidycells-intro.Rmd' failed with diagnostics:
no applicable method for 'select_' applied to an object of class "NULL"
--- failed re-building ‘tidycells-intro.Rmd’

SUMMARY: processing the following file failed:
  ‘tidycells-intro.Rmd’

Error: Vignette re-building failed.
Execution halted

Details

RHub Console Output (vignette) 2
--- re-building 'tidycells-intro.Rmd' using rmarkdown
Quitting from lines 298-314 (tidycells-intro.Rmd) 
Error: processing vignette 'tidycells-intro.Rmd' failed with diagnostics:
there is no package called 'utf8'
--- failed re-building 'tidycells-intro.Rmd'

SUMMARY: processing the following file failed:
  'tidycells-intro.Rmd'

Error: Vignette re-building failed.
Execution halted)

Details

All winbuilder builds are successful

Issue 'utf8'

#' as_cell_df(iris)

expect_output(read_cells(), "Support present for following type of files")

ext_pkgs %>% purrr::map(~ {

Vignette Issue

This possibly by {utf8}

rc_part <- read_cells(fcsv, at_level = 2)

This may be avoided by safe dependency

dcomps <- dm$fn %>% purrr::map(read_cells)

Tests

expect_equal(dc0, expct_d2)

This may be avoided by safe dependency

if (rlang::is_installed("xlsx") & rlang::is_installed("readxl")) {

This may be avoided by safe dependency

For {dplyr} S3 class needs to be fixed (ordering)
see #40

Tests 1st Case : Console Output
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> library(tidycells)
> 
> 
> test_check("tidycells")
── 1. Failure: read_cells: csv works (@test-read_cells.R#75)  ──────────────────
`dc0` not equal to `expct_d2`.
2/20 mismatches
x[1]: "12"
y[1]: "1.5"

x[2]: "1.5"
y[2]: "12"

Support present for following type of files: csv, xls, xlsx, docx, pdf, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✔      ✔        │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
Support present for following type of files: csv, xls, xlsx, docx, pdf, html
Note:
☰ LibreOffice may be required for doc files
☰ Support is enabled for content type (means it will work even if the extension is wrong)
☰ Support not present for following type of files: doc
Details: 
┌──────────────────────────────────────────────�
│                                              │
│     type        package    present support   │
│   1 csv{utils}  utils        ✔      ✔        │
│   2 csv         readr        ✔      ✔        │
│   3 xls{readxl} readxl       ✔      ✔        │
│   4 xls         xlsx         ✔      ✔        │
│   5 xlsx        tidyxl       ✔      ✔        │
│   6 doc         docxtractr   ✔      ✖        │
│   7 docx        docxtractr   ✔      ✔        │
│   8 pdf         tabulizer    ✔      ✔        │
│   9 html        XML          ✔      ✔        │
│                                              │
└──────────────────────────────────────────────┘
�� testthat results  �����������������������������������������������������������
[ OK: 182 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 1 ]
1. Failure: read_cells: csv works (@test-read_cells.R#75) 

Error: testthat unit tests failed
Execution halted

That can happen for

testthat::expect_equal(c("12","1.5"), c("1.5","12"))

I don't know how that happened at max base::sort can be added in

safe dependency framework needs to be implemented

See #41

I don't know how to solve: Issue 'utf8'
It's may need to see it again what exactly happening here (RHub)

dplyr 1.0.0 is quite strict .. that's good I think
But it is less fun to code in R then .. I think R's biggest advantage is that R is really flexible. It can understand you.

  • ifelse has to be replaced with dplyr::if_else
  • NA to NA_ things
  • bind_row is strict and some function is changing the order of columns
  • why don't you use {strict}

For this ropensci/tabulapdf#106 Vignette Issue, 2 is occurring
in

dcomps <- dm$fn %>% purrr::map(read_cells)

This is tracked using cloud_picker framework in CITestR
Ref this Circle CI build

dependency framework for {tabulizer} is also required