ropensci/software-review

`fastMatMR`: Fast Matrix Market I/O

HaoZeke opened this issue Β· 60 comments

Date accepted: 2023-10-30
Submitting Author Name: Rohit Goswami
Submitting Author Github Handle: @HaoZeke
Repository: https://github.com/HaoZeke/fastMatMR
Version submitted:
Submission type: Standard
Editor: @maelle
Reviewers: @osorensen, @czeildi

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: fastMatMR
Title: "fastMatMR: High-Performance Matrix Market File Operations in R"
Version: 1.0.0.0
Authors@R:
    person("Rohit", "Goswami", , "rgoswami@ieee.org", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-2393-8056"))
Description: "fastMatMR is an R package offering high-performance read and write operations for Matrix Market files. It acts as a thin wrapper around the 'fast_matrix_market' C++ library, offering speed and extended support for Matrix Market formats. Unlike other R packages, fastMatMR supports not just sparse matrices but also dense vectors and matrices. This makes it a versatile choice for dealing with .mtx files in R."
License: MIT + file LICENSE
SystemRequirements: C++17
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
LinkingTo:
    cpp11
Suggests:
    ggplot2,
    knitr,
    Matrix,
    microbenchmark,
    rmarkdown,
    testthat (>= 3.0.0)
URL: https://github.com/HaoZeke/fastMatMR
BugReports: https://github.com/HaoZeke/fastMatMR/issues
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The matrix market exchange formats are crucial to much of the ecosystem. The fastMatMR package focuses on high-performance read and write operations for Matrix Market files, serving as a key tool for data extraction in computational and data science pipelines.

  • Who is the target audience and what are scientific applications of this package?

Data scientists who might want to load and test the NIST matrices which include:

comparative studies of algorithms for numerical linear algebra, featuring nearly 500 sparse matrices from a variety of applications, as well as matrix generation tools and services.

Additionally, this makes its simpler to interface to scipy and the rest of the data science ecosystem. This also includes working with the Tensor Algebra Compiler (TACO).

The Matrix package in R can perform similar operations but only for sparse matrices. The fastMatMR package not only provides enhanced performance but also extends support to dense matrices and vectors in base R, thus offering a more versatile solution.

We have both read and write performance vignettes backing up the claims made.

The package passes pkcheck: ropensci/fastMatMR#18, though the review bot disagrees :)

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

πŸš€

Editor check started

πŸ‘‹

Also, not part of the issue template, but the package passes pkcheck: ropensci/fastMatMR#18

Checks for fastMatMR (v1.0.0.0)

git hash: a910f6be

  • βœ”οΈ Package name is available
  • βœ”οΈ has a 'codemeta.json' file.
  • βœ”οΈ has a 'contributing' file.
  • βœ”οΈ uses 'roxygen2'.
  • βœ”οΈ 'DESCRIPTION' has a URL field.
  • βœ”οΈ 'DESCRIPTION' has a BugReports field.
  • βœ”οΈ Package has at least one HTML vignette
  • βœ”οΈ All functions have examples.
  • βœ”οΈ Package has continuous integration checks.
  • βœ–οΈ Package coverage is 68.1% (should be at least 75%).
  • βœ”οΈ R CMD check found no errors.
  • βœ”οΈ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal fastMatMR 6
imports NA NA
suggests ggplot2 NA
suggests knitr NA
suggests Matrix NA
suggests microbenchmark NA
suggests rmarkdown NA
suggests testthat NA
linking_to cpp11 NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

fastMatMR

fmm_to_mat (1), fmm_to_vec (1), mat_to_fmm (1), release_questions (1), sparse_to_fmm (1), vec_to_fmm (1)

NOTE: No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in C++ (7% in 3 files), C/C++ Header (91% in 15 files) and R (2% in 3 files)
  • 1 authors
  • 3 vignettes
  • no internal data file
  • 1 imported package
  • 7 exported functions (median 3 lines of code)
  • 8 non-exported functions in R (median 3 lines of code)
  • 217 C/C++ functions (median 5 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 3 21.5
files_src 3 84.3
files_inst 15 99.6
files_vignettes 3 92.4
files_tests 3 75.2
loc_R 45 4.3 TRUE
loc_src 202 27.6
loc_inst 2545 81.7
loc_vignettes 276 60.9
loc_tests 68 31.5
num_vignettes 3 94.2
n_fns_r 15 21.2
n_fns_r_exported 7 34.0
n_fns_r_not_exported 8 18.0
n_fns_src 217 89.5
n_fns_per_file_r 3 46.2
n_fns_per_file_src 8 68.9
num_params_per_fn 2 11.9
loc_per_fn_r 3 1.1 TRUE
loc_per_fn_r_exp 3 1.5 TRUE
loc_per_fn_r_not_exp 3 1.5 TRUE
loc_per_fn_src 5 5.0 TRUE
rel_whitespace_R 29 11.6
rel_whitespace_src 20 31.5
rel_whitespace_inst 22 82.8
rel_whitespace_vignettes 41 68.6
rel_whitespace_tests 24 30.6
doclines_per_fn_exp 18 10.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 480 95.1 TRUE

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml
pkgcheck

GitHub Workflow Results

id name conclusion sha run_number date
6056018810 pages build and deployment success c3523b 8 2023-09-02
6056006940 pkgcheck success fe999d 21 2023-09-02
6056006937 pkgdown success fe999d 23 2023-09-02
6056006936 pre-commit success fe999d 49 2023-09-02
6056006938 R-CMD-check success fe999d 39 2023-09-02

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking installed package size ... NOTE
    installed size is 11.1Mb
    sub-directories of 1Mb or more:
    libs 10.2Mb

R CMD check generated the following check_fail:

  1. rcmdcheck_reasonable_installed_size

Test coverage with covr

Package coverage: 68.1

The following files are not completely covered by tests:

file coverage
inst/include/fast_matrix_market/chunking.hpp 63.83%
inst/include/fast_matrix_market/fast_matrix_market.hpp 18.92%
inst/include/fast_matrix_market/header.hpp 59.69%
inst/include/fast_matrix_market/read_body_threads.hpp 67.44%
inst/include/fast_matrix_market/read_body.hpp 42.29%
inst/include/fast_matrix_market/write_body.hpp 50%
R/misc.R 0%

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 1 potential issues:

message number of times
Avoid library() and require() calls in packages 1


Package Versions

package version
pkgstats 0.1.3.8
pkgcheck 0.1.2.2


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with βœ–οΈ have been resolved.

FWIW there's .covignore which is not being picked up by the bot here (works on the repo, ropensci/fastMatMR#18). Package coverage is:

> covr::package_coverage()
fastMatMR Coverage: 94.69%
src/from_file.cpp: 93.88%
src/to_file.cpp: 94.23%
R/fastMatMR-package.R: 100.00%
mpadge commented

@HaoZeke is indeed right there - the failing coverage check is a bug on our side, and not this submission.

@ropensci-review-bot assign @maelle as editor

Assigned! @maelle is now the editor

maelle commented

πŸ‘‹ @HaoZeke! Nice to see you as an author this time around, thank you for your submission! πŸ˜€

A few comments/questions before we proceed to the reviewer search:

  • Why add the bundled files to .covrignore, if they contribute to the functionality of the package?
  • I see you use conventional commits, nice. How is the changelog generated? I see ropensci/fastMatMR#17 Disclaimer: I contribute to https://github.com/cynkra/fledge/
  • In the example where you use -> I'd suggest using <- instead since the former is more rare.
  • In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README?
  • To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide.
  • In the contributing guide could you please add a link to pixi? I for instance do not know what it is.
  • In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?

Thank you!

πŸ‘‹ @HaoZeke! Nice to see you as an author this time around, thank you for your submission! πŸ˜€

^_^

A few comments/questions before we proceed to the reviewer search:

* Why add the bundled files to .covrignore, if they contribute to the functionality of the package?

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

The user-facing C++ functions are in src/ and covered by tests.

* I see you use conventional commits, nice. How is the changelog generated? I see [DOC: Fix news HaoZeke/fastMatMR#17](https://github.com/HaoZeke/fastMatMR/issues/17) Disclaimer: I contribute to https://github.com/cynkra/fledge/

That's a very cool library, I am currently using towncrier and tbump which are fairly generic tools.

* In the example where you use `->` I'd suggest using `<-` instead since the former is more rare.

* In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README?

* To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide.

* In the contributing guide could you please add a link to pixi? I for instance do not know what it is.

Yup these are great suggestions, updated the package with them.

* In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?

These are more version 2 issues and shouldn't block the review I think. It would take a while for me to get to them, the current functionality covers R matrices, vectors and sparse matrices from Matrix which cover most of the use-cases. Other sparse matrix formats are on the issue list but they're very optional "nice to have" issues since users can always convert to Matrix formats / dense types before using the library.

Thank you!

You're welcome :)

maelle commented

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?

We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.

This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.

To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?

Mostly maintainence. I have actually stripped the (already small) library of header files which are not used at all (Eigen / Armadillo / other C++ library compatibility files).

In general, for bundled libraries it is a good idea to not tamper with the upstream source as much as possible (it makes it easier to update to new versions).

We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.

Yup, I understand. Policy wise I'd be even stricter, and say that bundled libraries must not be touched at all (other than whole-file removals if they are not relevant and self contained).

This is standard practice in bundling libraries in C++ too, where projects might have different linting rules and external files are ignored instead of refactored in-place within the project. The idea is that this way upstream developers can weigh in more easily on the code.

It also allows people to write bindings as long as they understand the public API of the code. Often times users who want bindings are not fully aware / have the expertise to make decisions about code removal from (possibly) complicated upstream C++ code.

maelle commented

Thank you for your thorough answer! I'll now look for reviewers.

maelle commented

@ropensci-review-bot seeking reviewers

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/606_status.svg)](https://github.com/ropensci/software-review/issues/606)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@osorensen added to the reviewers list. Review due date is 2023-10-06. Thanks @osorensen for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@osorensen: If you haven't done so, please fill this form for us to update our reviewers records.

@czeildi added to the reviewers list. Review due date is 2023-10-07. Thanks @czeildi for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@czeildi: If you haven't done so, please fill this form for us to update our reviewers records.

@HaoZeke and @maelle, here is my initial review. My comments at the bottom explain a bit more about the boxes I haven't checked.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • β˜’ As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • β˜’ A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • β˜’ Installation instructions: for the development version of package and any non-standard dependencies in README
  • β˜’ Vignette(s): demonstrating major functionality that runs successfully locally
  • β˜’ Function Documentation: for all exported functions
  • β˜’ Examples: (that run successfully locally) for all exported functions
  • β˜’ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • β˜’ Installation: Installation succeeds as documented.
  • β˜’ Functionality: Any functional claims of the software been confirmed.
  • β˜’ Performance: Any performance claims of the software been confirmed.
  • β˜’ Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • β˜’ Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 2 hours

  • β˜’ Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (β€œrev” role) in the package DESCRIPTION file.
    I have not done anything of importance, so I should not be listed.

Review comments

Dense matrices with the matrix package

In README, you state that the Matrix package only handles sparse objects. This is not correct.

library(Matrix)

The command below lists all dense matrix types supported by the Matrix package.

showClass("denseMatrix")

## Virtual Class "denseMatrix" [package "Matrix"]
## 
## Slots:
##                         
## Name:       Dim Dimnames
## Class:  integer     list
## 
## Extends: 
## Class "Matrix", directly
## Class "mMatrix", by class "Matrix", distance 2
## Class "replValueSp", by class "Matrix", distance 2
## 
## Known Subclasses: 
## Class "unpackedMatrix", directly
## Class "packedMatrix", directly
## Class "ndenseMatrix", directly
## Class "ldenseMatrix", directly
## Class "ddenseMatrix", directly
## Class "ngeMatrix", by class "unpackedMatrix", distance 2
## Class "ntrMatrix", by class "unpackedMatrix", distance 2
## Class "nsyMatrix", by class "unpackedMatrix", distance 2
## Class "ntpMatrix", by class "packedMatrix", distance 2
## Class "nspMatrix", by class "packedMatrix", distance 2
## Class "lgeMatrix", by class "unpackedMatrix", distance 2
## Class "ltrMatrix", by class "unpackedMatrix", distance 2
## Class "lsyMatrix", by class "unpackedMatrix", distance 2
## Class "ltpMatrix", by class "packedMatrix", distance 2
## Class "lspMatrix", by class "packedMatrix", distance 2
## Class "dgeMatrix", by class "unpackedMatrix", distance 2
## Class "dtrMatrix", by class "unpackedMatrix", distance 2
## Class "dsyMatrix", by class "unpackedMatrix", distance 2
## Class "dpoMatrix", by class "dsyMatrix", distance 3
## Class "corMatrix", by class "dpoMatrix", distance 4
## Class "dtpMatrix", by class "packedMatrix", distance 2
## Class "dspMatrix", by class "packedMatrix", distance 2
## Class "dppMatrix", by class "dspMatrix", distance 3
## Class "pcorMatrix", by class "dppMatrix", distance 4

Below is an example of a dense matrix from the Matrix package:

(dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))

## 5 x 2 Matrix of class "dgeMatrix"
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

str(dense_matrix)

## Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
##   ..@ Dim     : int [1:2] 5 2
##   ..@ Dimnames:List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##   ..@ x       : num [1:10] 1 2 3 4 5 6 7 8 9 10
##   ..@ factors : list()

Statement of need

The README file contains a section titled β€œWhy?”, but it does not state explicitly what problems the software is designed to solved nor what the target audience is.

Vignettes

There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.

Examples

I understand that the fmm_to_* functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_* in order to have an example that can actually be run?

Just a quick clarification, @osorensen, the readme wording is a bit unclear but it states in the previous section that the goal is the reading and writing of Matrix Market files. For that, it is true that Matrix does not support dense matrices:

> library("Matrix")
> (dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
> writeMM(dense_matrix, "mat.mtx")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function β€˜writeMM’ for signature β€˜"dgeMatrix"’
> ?writeMM
...
     Read matrices stored in the Harwell-Boeing or MatrixMarket formats
     or write β€˜sparseMatrix’ objects to one of these formats.
...

The longer explanation is part of the ROpenSci issue (intended audience, use case) but I will add it to the readme soon.

P.S. Thanks for accepting the review and getting started with this :)

maelle commented

Thank you @osorensen!!

Logged review for osorensen (hours: 2)

πŸ“† @czeildi you have 2 days left before the due date for your review (2023-10-07).

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 5

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

All in all, I could easily install the package, understand its functionality
and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).

I have a couple of comments, detailed below. Most of them are non-blocking and
might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)

Functionality

  • Performance: I did not re-run the benchmark vignettes myself but tried the package
    out with some moderately large matrices and it felt reasonably fast. πŸ‘πŸΌ
  • it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
    to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
    (I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)
  • I really appreciated the precise error message in
vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'

However, I was wondering if it could actually accept integers as well?

  • improve error message? It is a matrix, just sparse, not plain
mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix
  • improve error message?
mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'
  • future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?
  • consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?

Documentation

  • I did not tick function documentation because although they exist, argument names
    (fname vs filename) are not always consistent between the documentation and the code
    itself
  • I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write
    functions then the examples of the read functions run correctly but they are not self contained
  • readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although
    I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal
  • In the readme: "we support / the package supports". Consider unifying these two wording styles
  • rename basic_usage to getting started? The vignette title is getting started but you can see basic_usage in the url
  • clarify what the addendum is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA by Matrix package?
  • Maybe add a bit more context on why would I want to read back in Python? Possibly rather focus on being a language agnostic format?
  • consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette
  • Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?
  • In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value
    but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?
  • write_fmm example: sparse_mat variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included
  • readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette."
    Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen?
  • I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability

Code

  • I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not
    be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be,
    but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works)
  • package version in DESCRIPTION does not match package version in news, is this intended?
  • Ideally, tests would not write to the project root in my opinion even if those files are added
    to .gitignore, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution.
  • the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)
  • I think there is no need to add library(testthat) to test files
  • consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34
  • this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules
  • pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?
  • where is this included from? I only see cpp11.cpp https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10
    My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages
  • commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)
  • what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30
  • possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.
maelle commented

Thanks a lot @czeildi!!

Logged review for czeildi (hours: 5)

@HaoZeke: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

Sorry for the delay. Thank you reviewers and editors. I've made changes, in accordance to the suggestions and comments, and made a new release.

Response to @osorensen

  • Dense matrices with the matrix package

Response in #606 (comment).

  • Statement of need

The README file contains a section titled β€œWhy?”, but it does not state explicitly what problems the software is designed to solved nor what the target audience is

I have rewritten the readme to explain the context.

  • Vignettes

There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.

The functionality of the package is to read and write a particular file format, commonly used in data-science across many languages including R. Apart from the dense matrix and vector support, the vignettes show performance improvements over the implementation in Matrix.

  • Examples

I understand that the fmm_to_* functions need an mtx file to work, and I guess it’s therefore the examples with these functions are marked with \dontrun{}. However, wouldn’t it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_* in order to have an example that can actually be run?

The write_fmm functions can be run locally, for saving R objects as shown, and then those can be used subsequently for loading. The speed increases are largely by using C++ to load data in an optimal manner so this isn't feasible. ropensci/fastMatMR#42

Response to @czeildi

All in all, I could easily install the package, understand its functionality
and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).

Glad to hear it :)

I have a couple of comments, detailed below. Most of them are non-blocking and
might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)

These were all excellent, and I addressed each of them in the linked issues, I will briefly link to them again.

Performance: I did not re-run the benchmark vignettes myself but tried the package
out with some moderately large matrices and it felt reasonably fast. πŸ‘πŸΌ

Glad to hear that. The benchmark vignettes are very large and not really meant to be run often (they can take upto 30-40 minutes).

it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
(I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)

This was a pretty involved feature, but I managed to get it in ropensci/fastMatMR#24.

I really appreciated the precise error message in

vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'

However, I was wondering if it could actually accept integers as well?

Implemented in ropensci/fastMatMR#25.

improve error message? It is a matrix, just sparse, not plain

mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix

As noted in ropensci/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.

improve error message?

mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'

Same as above (ropensci/fastMatMR#27), write_fmm has the right user-facing semantics.

future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?

This one is tracked, but not for this release. It would need quite a bit more design and I'm not sure it will be in scope. Tracked here: ropensci/fastMatMR#28

consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?

As noted in ropensci/fastMatMR#29,
sparse_to_fmm has been converted to sparse_Matrix_to_fmm for consistency.

Documentation

I did not tick function documentation because although they exist, argument names
(fname vs filename) are not always consistent between the documentation and the code
itself

I have fixed this in ropensci/fastMatMR#30.

I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write
functions then the examples of the read functions run correctly but they are not self contained

Completed in ropensci/fastMatMR#42 and ropensci/fastMatMR#31. Essentially, while the write_fmm can be self-contained, fmm_to_* need to have matrix market files present.

readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although
I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal
In the readme: "we support / the package supports". Consider unifying these two wording styles

Completed in ropensci/fastMatMR#32 and ropensci/fastMatMR#33. Along with the suggestions from the other reviewer, the readme has been more or less completely re-written, and I hope it is clearer now.

rename basic_usage to getting started? The vignette title is getting started but you can see basic_usage in the url

Completed in ropensci/fastMatMR#34.

clarify what the addendum is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA by Matrix package?

Completed in ropensci/fastMatMR#35.

Maybe add a bit more context on why would I want to read back in Python? Possibly rather focus on being a language agnostic format?

Completed in ropensci/fastMatMR#36.

consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette

Completed in ropensci/fastMatMR#37.

Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?

Yes. All scales were considered, the plots use a mix of log-log and semi-log to show the trends, and all of them are interpreted in the text as well. Completed in ropensci/fastMatMR#38.

In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value
but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?

Completed in ropensci/fastMatMR#39.

write_fmm example: sparse_mat variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included

Completed in ropensci/fastMatMR#40.

readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette." Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen?

Rewritten slightly for the new readme.

I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability

Completed in ropensci/fastMatMR#41.

Code

I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not
be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be,
but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works)
package version in DESCRIPTION does not match package version in news, is this intended?

Completed in ropensci/fastMatMR#43.

Ideally, tests would not write to the project root in my opinion even if those files are added
to .gitignore, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution.
the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)

Completed in ropensci/fastMatMR#44.

I think there is no need to add library(testthat) to test files

Completed in ropensci/fastMatMR#45.

consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34

These are run already, but the tests were refactored so I think it should be OK now. Completed in ropensci/fastMatMR#45.

this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules

Since these are C++ functions, the linting is basically for everything. Completed in ropensci/fastMatMR#51.

pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?

Completed in ropensci/fastMatMR#47.

where is this included from? I only see cpp11.cpp https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10
My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages

This is loaded in as part of the cpp11 package. It is found by R by using the following directive in the DESCRIPTION (as suggested by the documentation):

LinkingTo:
    cpp11

There really isn't a good way to get an IDE to be aware of the R installed package headers. cpp11.cpp is actually auto-generated :)

Completed in ropensci/fastMatMR#46.

commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)

Done in ropensci/fastMatMR#48.

what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30

If os.open() fails the function will return FALSE, and nothing else needs to be done, completed in ropensci/fastMatMR#49.

possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.

Completed in ropensci/fastMatMR#50.

============================

Thank you all for the great comments!

Logged author response!

@maelle sorry for the delay, I have responded and logged it via the bot :)

@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)

I still have two small comments (neither blocking approval):

As noted in ropensci/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.

If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.

@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)

Wonderful, thanks so much ^_^

I still have two small comments (neither blocking approval):

Merging soon :)

As noted in HaoZeke/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.

If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.

I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::

I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::

Makes sense, thank you for the response!

Thanks @HaoZeke, I have completed my checklist.

@maelle since the reviews are satisfactorily completed (thanks all) I was wondering whta the next steps are :)

maelle commented

πŸ‘‹ @HaoZeke! Thanks for your work!

Thanks @czeildi @osorensen! Could please use the reviewer approval template for approval? Thanks!

Reviewer Response

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3

Reviewer Response

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6

maelle commented

@ropensci-review-bot approve fastMatMR

Approved! Thanks @HaoZeke for submitting and @osorensen, @czeildi for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account.
    This invitation will expire after one week. If it happens write a comment @ropensci-review-bot invite me to ropensci/<package-name> which will re-send an invitation.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
  • Skim the docs of the pkgdown automatic deployment, in particular if your website needs MathJax.
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

Logged review for osorensen (hours: 3)

Logged review for czeildi (hours: 6)

@ropensci-review-bot finalize transfer of fastMatMR

Transfer completed.
The fastMatMR team is now owner of the repository and the author has been invited to the team

Sorry to bring this up here, but, I can't seem to get the centralized documentation and r-universe builds to run, @maelle would you have any suggestions?

P.S. Thanks all for the help, this was a really useful process :)

Also perhaps @mpadge would be better suited to offer advice on #606 (comment)?

Also @ropensci/blog-editors, I think a blog post showing the enhanced inter-operability offered by this package might be of interest, if so, I'd be happy to discuss submitting a blog post :)

mpadge commented

@HaoZeke r-universe only builds at most every few hours, so my advice would be just wait and build should appear

maelle commented

https://docs.ropensci.org/fastMatMR/ is up πŸŽ‰

The dev version of the dev guide has a package maintainer cheatsheet: https://devdevguide.netlify.app/maintenance_cheatsheet

Thanks so much @HaoZeke for your participation in the process!

Now on CRAN!

image

maelle commented

wow that was fast, congrats!!