ropensci/software-review

Submission: rio

chainsawriot opened this issue ยท 11 comments

Submitting Author Name: Chung-hong Chan
Submitting Author Github Handle: @chainsawriot
Other Package Authors Github handles: (comma separated, delete if none) @leeper
Repository: https://github.com/chainsawriot/rio
Version submitted: 0.5.30
Submission type: Standard
Editor: TBD
Reviewers: TBD

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: rio
Type: Package
Title: A Swiss-Army Knife for Data I/O
Version: 0.5.30
Authors@R: c(person("Jason", "Becker", role = "ctb", email = "jason@jbecker.co"),
             person("Chung-hong", "Chan", role = c("aut", "cre"), email = "chainsawtiney@gmail.com",
	     	     comment = c(ORCID = "0000-0002-6232-7530")),
             person("Geoffrey CH", "Chan", role = "ctb", email = "gefchchan@gmail.com"),
             person("Thomas J.", "Leeper",
                    role = "aut", 
                    email = "thosjleeper@gmail.com",
                    comment = c(ORCID = "0000-0003-4097-6326")),
             person("Christopher", "Gandrud", role = "ctb"),
             person("Andrew", "MacDonald", role = "ctb"),
             person("Ista", "Zahn", role = "ctb"),
             person("Stanislaus", "Stadlmann", role = "ctb"),
             person("Ruaridh", "Williamson", role = "ctb", email = "ruaridh.williamson@gmail.com"),
             person("Patrick", "Kennedy", role = "ctb"),
             person("Ryan", "Price", email = "ryapric@gmail.com", role = "ctb"),
             person("Trevor L", "Davis", email = "trevor.l.davis@gmail.com", role = "ctb"),
             person("Nathan", "Day", email = "nathancday@gmail.com", role = "ctb"),
             person("Bill", "Denney",
                    email="wdenney@humanpredictions.com",
                    role="ctb",
                    comment=c(ORCID="0000-0002-5759-428X")),
             person("Alex", "Bokov", email = "alex.bokov@gmail.com", role = "ctb",
                    comment=c(ORCID="0000-0002-0511-9815"))
             )
Description: Streamlined data import and export by making assumptions that
    the user is probably willing to make: 'import()' and 'export()' determine
    the data structure from the file extension, reasonable defaults are used for
    data import and export (e.g., 'stringsAsFactors=FALSE'), web-based import is
    natively supported (including from SSL/HTTPS), compressed files can be read
    directly without explicit decompression, and fast import packages are used where
    appropriate. An additional convenience function, 'convert()', provides a simple
    method for converting between file types.
URL: https://github.com/chainsawriot/rio
BugReports: https://github.com/chainsawriot/rio/issues
Depends:
    R (>= 3.6)
Imports:
    tools,
    stats,
    utils,
    foreign,
    haven (>= 1.1.2),
    curl (>= 0.6),
    data.table (>= 1.9.8),
    readxl (>= 0.1.1),
    openxlsx,
    tibble
Suggests:
    datasets,
    bit64,
    testthat,
    knitr,
    magrittr,
    arrow,
    clipr,
    feather,
    fst,
    hexView,
    jsonlite,
    pzfx,
    readODS (>= 1.6.4),
    readr,
    rmarkdown,
    rmatio,
    xml2 (>= 1.2.0),
    yaml
License: GPL-2
VignetteBuilder: knitr
Encoding: UTF-8
RoxygenNote: 7.2.3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

This package is for loading and saving data from either files or urls.

  • Who is the target audience and what are scientific applications of this package?

Probably all scientific disciplines that involve dealing with data files.

As far as I know there are four: reader (not readr), io, ImportExport, and SchemaOnRead. The current package is probably the most used.

Yes

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

No. I am sorry.

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

๐Ÿš€

Editor check started

๐Ÿ‘‹

Checks for rio (v0.5.30)

git hash: daf6cd15

  • โœ”๏ธ Package is already on CRAN.
  • โœ–๏ธ does not have a 'codemeta.json' file.
  • โœ”๏ธ has a 'contributing' file.
  • โœ–๏ธ The following function has no documented return value: [characterize]
  • โœ”๏ธ uses 'roxygen2'.
  • โœ”๏ธ 'DESCRIPTION' has a URL field.
  • โœ”๏ธ 'DESCRIPTION' has a BugReports field.
  • โœ”๏ธ Package has at least one HTML vignette
  • โœ–๏ธ These functions do not have examples: [arg_reconcile, .import, get_ext, install_formats].
  • โœ”๏ธ Package has continuous integration checks.
  • โœ”๏ธ Package coverage is 87.4%.
  • โœ”๏ธ R CMD check found no errors.
  • โœ”๏ธ R CMD check found no warnings.
  • ๐Ÿ‘€ Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with ๐Ÿ‘€ may be optionally addressed.)

Package License: GPL-2


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 494
internal rio 33
internal grDevices 4
internal graphics 3
internal methods 1
imports utils 23
imports haven 11
imports tools 8
imports openxlsx 5
imports stats 3
imports foreign 3
imports data.table 3
imports curl 2
imports readxl 1
imports tibble 1
suggests xml2 17
suggests clipr 3
suggests pzfx 3
suggests rmatio 3
suggests feather 2
suggests fst 2
suggests readODS 2
suggests readr 2
suggests arrow 1
suggests jsonlite 1
suggests datasets NA
suggests bit64 NA
suggests testthat NA
suggests knitr NA
suggests magrittr NA
suggests hexView NA
suggests rmarkdown NA
suggests yaml NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

file (148), which (46), list (30), c (29), lapply (16), names (14), for (11), attributes (10), do.call (9), paste0 (9), seq_along (9), args (7), row.names (7), invisible (6), length (6), unlist (6), drop (5), sapply (5), try (5), url (5), as.character (4), format (4), nchar (4), tempfile (4), basename (3), col (3), new.env (3), raw (3), regexpr (3), regmatches (3), rep (3), return (3), seq_len (3), strsplit (3), unclass (3), class (2), cumsum (2), formals (2), gettext (2), grep (2), gsub (2), levels (2), max (2), nrow (2), readLines (2), setdiff (2), sort (2), switch (2), table (2), tolower (2), unique (2), alist (1), as.environment (1), as.numeric (1), attr (1), by (1), cbind.data.frame (1), comment (1), dump (1), duplicated (1), getOption (1), getwd (1), if (1), is.na (1), labels (1), library (1), match.arg (1), match.call (1), ncol (1), paste (1), quote (1), read.dcf (1), readBin (1), rownames (1), sink (1), sprintf (1), structure (1), sub (1), substitute (1), system.file (1), T (1)

rio

import (4), twrap (3), arg_reconcile (2), doone (2), export (2), extract_html_row (2), find_compress (2), get_ext (2), uninstalled_formats (2), characterize (1), characterize.data.frame (1), characterize.default (1), compress_out (1), convert (1), convert_google_url (1), export_delim (1), factorize (1), factorize.data.frame (1), factorize.default (1), gather_attrs (1), standardize_attributes (1)

utils

data (5), unzip (5), untar (4), type.convert (2), zip (2), head (1), packageName (1), read.fortran (1), tar (1), write.table (1)

xml2

read_xml (4), xml_add_child (4), read_html (3), as_list (2), xml_find_all (2), xml_attrs (1), xml_children (1)

haven

write_sav (4), write_dta (2), write_sas (2), write_xpt (2), read_sas (1)

tools

file_ext (5), file_path_sans_ext (3)

openxlsx

addWorksheet (1), getSheetNames (1), loadWorkbook (1), saveWorkbook (1), write.xlsx (1)

grDevices

bmp (1), jpeg (1), png (1), tiff (1)

clipr

read_clip (1), read_clip_tbl (1), write_clip (1)

data.table

rbindlist (2), fwrite (1)

foreign

read.dta (1), read.systat (1), write.dbf (1)

graphics

title (2), text (1)

pzfx

read_pzfx (2), write_pzfx (1)

rmatio

write.mat (2), read.mat (1)

stats

setNames (3)

curl

curl_fetch_memory (1), parse_headers (1)

feather

read_feather (1), write_feather (1)

fst

read.fst (1), write.fst (1)

readODS

read_ods (1), write_ods (1)

readr

fwf_empty (1), read_fwf (1)

arrow

write_parquet (1)

jsonlite

fromJSON (1)

methods

is (1)

readxl

excel_sheets (1)

tibble

as_tibble (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 22 files) and
  • 2 authors
  • 1 vignette
  • no internal data file
  • 10 imported packages
  • 21 exported functions (median 15 lines of code)
  • 202 non-exported functions in R (median 4 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 22 83.6
files_vignettes 1 68.4
files_tests 54 99.3
loc_R 1513 78.2
loc_vignettes 182 46.0
loc_tests 1196 89.2
num_vignettes 1 64.8
n_fns_r 223 91.2
n_fns_r_exported 21 68.8
n_fns_r_not_exported 202 93.3
n_fns_per_file_r 5 71.4
num_params_per_fn 2 11.9
loc_per_fn_r 5 8.1
loc_per_fn_r_exp 15 35.6
loc_per_fn_r_not_exp 4 9.3
rel_whitespace_R 8 57.4
rel_whitespace_vignettes 38 50.9
rel_whitespace_tests 19 86.4
doclines_per_fn_exp 50 63.0
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 88 77.1

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

(There do not appear to be any)

GitHub Workflow Results

id name conclusion sha run_number date
6010120891 pages build and deployment failure 57d3a2 5 2023-08-29
6012061439 R-CMD-check success fd7053 6 2023-08-29
6012061436 test-coverage success fd7053 18 2023-08-29

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 87.35

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
import_list 31
import 24
arg_reconcile 20
import_delim 18
set_class 17

Static code analyses with lintr

lintr found the following 314 potential issues:

message number of times
Avoid 1:nrow(...) expressions, use seq_len. 2
Avoid changing the working directory, or restore it in on.exit 3
Avoid library() and require() calls in packages 4
Avoid using sapply, consider vapply instead, that's type safe 8
Lines should not be more than 80 characters. 297


4. Other Checks

Details of other checks (click to open)

โœ–๏ธ The following 4 function names are duplicated in other packages:

    • convert from AquaEnv, ascii, breakaway, cabootcrs, CHNOSZ, convertr, coreCT, DDIwR, equateIRT, hablar, khroma, nCov2019, phenopix, qtl, quanteda, rMIDAS, scan, StratigrapheR, tidygraph, tis, wavethresh
    • export from admisc, aLFQ, box, box, bruceR, campsismod, crestr, EviewsR, flux, fsbrain, gm, grainscape, inTextSummaryTable, job, kimisc, Momocs, Morpho, mpm, pitchRx, scan, seewave, soc.ca, strvalidator, tipsae, wpa
    • factorize from admisc, conf.design, elliptic, Epi, gmp, labdsv, lme4, mosaic, QCApro, RcmdrPlugin.KMggplot2, rminer, sfsmisc
    • import from act, aLFQ, ambiorix, backports, bruceR, EviewsR, fSRM, importar, isqg, MALDIquantForeign, NMproject, openair, reticulate, reticulate, rTorch, strvalidator, tensorflow


Package Versions

package version
pkgstats 0.1.3.7
pkgcheck 0.1.2.1


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with โœ–๏ธ have been resolved.

Hello @chainsawriot, here are the things you can ask me to do:


# Add an author's response info to the ROpenSci logs
@ropensci-review-bot submit response <AUTHOR_RESPONSE_URL>

# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name

# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name

# Various package checks
@ropensci-review-bot check package

# Checks srr documentation for stats packages
@ropensci-review-bot check srr

Thanks, about to send the query.

๐Ÿš€

Editor check started

๐Ÿ‘‹

@ropensci My local check with pkgcheck showed that there should be no more items marked with x, except the optional point on duplicated function names. However, as a decade old package it is probably harmful in terms of computational reproducibility to change those generic function names now: import, export, convert and factorize.

httr::HEAD("https://badges.ropensci.org/605_status.svg")
#> Response [https://badges.ropensci.org/605_status.svg]
#>   Date: 2023-08-30 15:32
#>   Status: 404
#>   Content-Type: text/html; charset=utf-8
#> <EMPTY BODY>

Created on 2023-08-30 with reprex v2.0.2

Thank you for this submission @chainsawriot! I realize the last response from the bot is an error, as a badge should not be generated or checked for until after an editor has approved moving forward with the process.

I believe rio is out of scope for us. Per the package descriptions in our Aims and Scope, retrieval, extraction, or munging categories should be specific to "data sources / topics", "aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment", or "focus on tools for handling data in specific scientific formats generated from scientific workflows or exported from scientific instruments." The reason for this is that it is hard to have objective reviews for where we draw on relevant field expertise with highly general/swiss army tools. The latter are more likely to have a lot of users that provide feedback so need the review process less. I would recommend JOSS as a venue for reviewing and publishing rio.