`fastMatMR`: Fast Matrix Market I/O
HaoZeke opened this issue Β· 60 comments
Date accepted: 2023-10-30
Submitting Author Name: Rohit Goswami
Submitting Author Github Handle: @HaoZeke
Repository: https://github.com/HaoZeke/fastMatMR
Version submitted:
Submission type: Standard
Editor: @maelle
Reviewers: @osorensen, @czeildi
Archive: TBD
Version accepted: TBD
Language: en
- Paste the full DESCRIPTION file inside a code block below:
Package: fastMatMR
Title: "fastMatMR: High-Performance Matrix Market File Operations in R"
Version: 1.0.0.0
Authors@R:
person("Rohit", "Goswami", , "rgoswami@ieee.org", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-2393-8056"))
Description: "fastMatMR is an R package offering high-performance read and write operations for Matrix Market files. It acts as a thin wrapper around the 'fast_matrix_market' C++ library, offering speed and extended support for Matrix Market formats. Unlike other R packages, fastMatMR supports not just sparse matrices but also dense vectors and matrices. This makes it a versatile choice for dealing with .mtx files in R."
License: MIT + file LICENSE
SystemRequirements: C++17
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
LinkingTo:
cpp11
Suggests:
ggplot2,
knitr,
Matrix,
microbenchmark,
rmarkdown,
testthat (>= 3.0.0)
URL: https://github.com/HaoZeke/fastMatMR
BugReports: https://github.com/HaoZeke/fastMatMR/issues
Config/testthat/edition: 3
VignetteBuilder: knitr
Scope
-
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- data validation and testing
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The matrix market exchange formats are crucial to much of the ecosystem. The fastMatMR package focuses on high-performance read and write operations for Matrix Market files, serving as a key tool for data extraction in computational and data science pipelines.
- Who is the target audience and what are scientific applications of this package?
Data scientists who might want to load and test the NIST matrices which include:
comparative studies of algorithms for numerical linear algebra, featuring nearly 500 sparse matrices from a variety of applications, as well as matrix generation tools and services.
Additionally, this makes its simpler to interface to scipy
and the rest of the data science ecosystem. This also includes working with the Tensor Algebra Compiler (TACO).
- Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
The Matrix
package in R
can perform similar operations but only for sparse matrices. The fastMatMR
package not only provides enhanced performance but also extends support to dense matrices and vectors in base R
, thus offering a more versatile solution.
We have both read and write performance vignettes backing up the claims made.
-
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
-
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
-
Explain reasons for any
pkgcheck
items which your package is unable to pass.
The package passes pkcheck
: ropensci/fastMatMR#18, though the review bot disagrees :)
Technical checks
Confirm each of the following by checking the box.
- I have read the rOpenSci packaging guide.
- I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.
This package:
- does not violate the Terms of Service of any service it interacts with.
- has a CRAN and OSI accepted license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions, created with roxygen2.
- contains a vignette with examples of its essential functions and uses.
- has a test suite.
- has continuous integration, including reporting of test coverage.
Publication options
-
Do you intend for this package to go on CRAN?
-
Do you intend for this package to go on Bioconductor?
-
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
π
Editor check started
π
Also, not part of the issue template, but the package passes pkcheck
: ropensci/fastMatMR#18
Checks for fastMatMR (v1.0.0.0)
git hash: a910f6be
- βοΈ Package name is available
- βοΈ has a 'codemeta.json' file.
- βοΈ has a 'contributing' file.
- βοΈ uses 'roxygen2'.
- βοΈ 'DESCRIPTION' has a URL field.
- βοΈ 'DESCRIPTION' has a BugReports field.
- βοΈ Package has at least one HTML vignette
- βοΈ All functions have examples.
- βοΈ Package has continuous integration checks.
- βοΈ Package coverage is 68.1% (should be at least 75%).
- βοΈ R CMD check found no errors.
- βοΈ R CMD check found no warnings.
Important: All failing checks above must be addressed prior to proceeding
Package License: MIT + file LICENSE
1. Package Dependencies
Details of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
type | package | ncalls |
---|---|---|
internal | fastMatMR | 6 |
imports | NA | NA |
suggests | ggplot2 | NA |
suggests | knitr | NA |
suggests | Matrix | NA |
suggests | microbenchmark | NA |
suggests | rmarkdown | NA |
suggests | testthat | NA |
linking_to | cpp11 | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
fastMatMR
fmm_to_mat (1), fmm_to_vec (1), mat_to_fmm (1), release_questions (1), sparse_to_fmm (1), vec_to_fmm (1)
NOTE: No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in C++ (7% in 3 files), C/C++ Header (91% in 15 files) and R (2% in 3 files)
- 1 authors
- 3 vignettes
- no internal data file
- 1 imported package
- 7 exported functions (median 3 lines of code)
- 8 non-exported functions in R (median 3 lines of code)
- 217 C/C++ functions (median 5 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:
loc
= "Lines of Code"fn
= "function"exp
/not_exp
= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown()
function
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 3 | 21.5 | |
files_src | 3 | 84.3 | |
files_inst | 15 | 99.6 | |
files_vignettes | 3 | 92.4 | |
files_tests | 3 | 75.2 | |
loc_R | 45 | 4.3 | TRUE |
loc_src | 202 | 27.6 | |
loc_inst | 2545 | 81.7 | |
loc_vignettes | 276 | 60.9 | |
loc_tests | 68 | 31.5 | |
num_vignettes | 3 | 94.2 | |
n_fns_r | 15 | 21.2 | |
n_fns_r_exported | 7 | 34.0 | |
n_fns_r_not_exported | 8 | 18.0 | |
n_fns_src | 217 | 89.5 | |
n_fns_per_file_r | 3 | 46.2 | |
n_fns_per_file_src | 8 | 68.9 | |
num_params_per_fn | 2 | 11.9 | |
loc_per_fn_r | 3 | 1.1 | TRUE |
loc_per_fn_r_exp | 3 | 1.5 | TRUE |
loc_per_fn_r_not_exp | 3 | 1.5 | TRUE |
loc_per_fn_src | 5 | 5.0 | TRUE |
rel_whitespace_R | 29 | 11.6 | |
rel_whitespace_src | 20 | 31.5 | |
rel_whitespace_inst | 22 | 82.8 | |
rel_whitespace_vignettes | 41 | 68.6 | |
rel_whitespace_tests | 24 | 30.6 | |
doclines_per_fn_exp | 18 | 10.4 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
fn_call_network_size | 480 | 95.1 | TRUE |
2a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
3. goodpractice
and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
6056018810 | pages build and deployment | success | c3523b | 8 | 2023-09-02 |
6056006940 | pkgcheck | success | fe999d | 21 | 2023-09-02 |
6056006937 | pkgdown | success | fe999d | 23 | 2023-09-02 |
6056006936 | pre-commit | success | fe999d | 49 | 2023-09-02 |
6056006938 | R-CMD-check | success | fe999d | 39 | 2023-09-02 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following note:
- checking installed package size ... NOTE
installed size is 11.1Mb
sub-directories of 1Mb or more:
libs 10.2Mb
R CMD check generated the following check_fail:
- rcmdcheck_reasonable_installed_size
Test coverage with covr
Package coverage: 68.1
The following files are not completely covered by tests:
file | coverage |
---|---|
inst/include/fast_matrix_market/chunking.hpp | 63.83% |
inst/include/fast_matrix_market/fast_matrix_market.hpp | 18.92% |
inst/include/fast_matrix_market/header.hpp | 59.69% |
inst/include/fast_matrix_market/read_body_threads.hpp | 67.44% |
inst/include/fast_matrix_market/read_body.hpp | 42.29% |
inst/include/fast_matrix_market/write_body.hpp | 50% |
R/misc.R | 0% |
Cyclocomplexity with cyclocomp
No functions have cyclocomplexity >= 15
Static code analyses with lintr
lintr found the following 1 potential issues:
message | number of times |
---|---|
Avoid library() and require() calls in packages | 1 |
Package Versions
package | version |
---|---|
pkgstats | 0.1.3.8 |
pkgcheck | 0.1.2.2 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with βοΈ have been resolved.
FWIW there's .covignore
which is not being picked up by the bot here (works on the repo, ropensci/fastMatMR#18). Package coverage is:
> covr::package_coverage()
fastMatMR Coverage: 94.69%
src/from_file.cpp: 93.88%
src/to_file.cpp: 94.23%
R/fastMatMR-package.R: 100.00%
@HaoZeke is indeed right there - the failing coverage check is a bug on our side, and not this submission.
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
π @HaoZeke! Nice to see you as an author this time around, thank you for your submission! π
A few comments/questions before we proceed to the reviewer search:
- Why add the bundled files to .covrignore, if they contribute to the functionality of the package?
- I see you use conventional commits, nice. How is the changelog generated? I see ropensci/fastMatMR#17 Disclaimer: I contribute to https://github.com/cynkra/fledge/
- In the example where you use
->
I'd suggest using<-
instead since the former is more rare. - In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README?
- To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide.
- In the contributing guide could you please add a link to pixi? I for instance do not know what it is.
- In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?
Thank you!
π @HaoZeke! Nice to see you as an author this time around, thank you for your submission! π
^_^
A few comments/questions before we proceed to the reviewer search:
* Why add the bundled files to .covrignore, if they contribute to the functionality of the package?
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market
C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
The user-facing C++ functions are in src/
and covered by tests.
* I see you use conventional commits, nice. How is the changelog generated? I see [DOC: Fix news HaoZeke/fastMatMR#17](https://github.com/HaoZeke/fastMatMR/issues/17) Disclaimer: I contribute to https://github.com/cynkra/fledge/
That's a very cool library, I am currently using towncrier and tbump which are fairly generic tools.
* In the example where you use `->` I'd suggest using `<-` instead since the former is more rare. * In the README you write "writing and reading from other R objects". Could you please elaborate, adding a few sentences about this to the README? * To me the second part of the README, "Development", belongs to the contributing guide, so you could move the content and instead point contributors to the contributing guide. * In the contributing guide could you please add a link to pixi? I for instance do not know what it is.
Yup these are great suggestions, updated the package with them.
* In the issue tracker there are a few enhancement issues. What's the timeline for them, are they "nice to have" or do you intend to tackle them soon? If so should the review wait?
These are more version 2 issues and shouldn't block the review I think. It would take a while for me to get to them, the current functionality covers R matrices, vectors and sparse matrices from Matrix
which cover most of the use-cases. Other sparse matrix formats are on the issue list but they're very optional "nice to have" issues since users can always convert to Matrix
formats / dense types before using the library.
Thank you!
You're welcome :)
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?
We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.
This is because there is no need to expose bindings of the inner workings of the fast_matrix_market C++ library to R users. This approach is also taken by other libraries which bundle C++ libraries, like RCppEigen.
To clarify, the question was about tests specifically. Part of the code in inst/ is covered by tests anyway, so why not include them (and remove the unused code)?
Mostly maintainence. I have actually stripped the (already small) library of header files which are not used at all (Eigen / Armadillo / other C++ library compatibility files).
In general, for bundled libraries it is a good idea to not tamper with the upstream source as much as possible (it makes it easier to update to new versions).
We do not have strict requirements on testing coverage for bundled code yet, but we can improve our guidance based on cases like this submission, so this discussion is important.
Yup, I understand. Policy wise I'd be even stricter, and say that bundled libraries must not be touched at all (other than whole-file removals if they are not relevant and self contained).
This is standard practice in bundling libraries in C++ too, where projects might have different linting rules and external
files are ignored instead of refactored in-place within the project. The idea is that this way upstream developers can weigh in more easily on the code.
It also allows people to write bindings as long as they understand the public API of the code. Often times users who want bindings are not fully aware / have the expertise to make decisions about code removal from (possibly) complicated upstream C++ code.
Thank you for your thorough answer! I'll now look for reviewers.
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/606_status.svg)](https://github.com/ropensci/software-review/issues/606)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @osorensen to reviewers
@osorensen added to the reviewers list. Review due date is 2023-10-06. Thanks @osorensen for accepting to review! Please refer to our reviewer guide.
rOpenSciβs community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@osorensen: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @czeildi to reviewers
@czeildi added to the reviewers list. Review due date is 2023-10-07. Thanks @czeildi for accepting to review! Please refer to our reviewer guide.
rOpenSciβs community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@HaoZeke and @maelle, here is my initial review. My comments at the bottom explain a bit more about the boxes I haven't checked.
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- β As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- β A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- β Installation instructions: for the development version of package and any non-standard dependencies in README
- β Vignette(s): demonstrating major functionality that runs successfully locally
- β Function Documentation: for all exported functions
- β Examples: (that run successfully locally) for all exported functions
- β Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- β Installation: Installation succeeds as documented.
- β Functionality: Any functional claims of the software been confirmed.
- β Performance: Any performance claims of the software been confirmed.
- β Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- β Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 2 hours
- β Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (βrevβ role) in the package DESCRIPTION file.
I have not done anything of importance, so I should not be listed.
Review comments
Dense matrices with the matrix package
In README, you state that the Matrix
package only handles sparse objects. This is not correct.
library(Matrix)
The command below lists all dense matrix types supported by the Matrix
package.
showClass("denseMatrix")
## Virtual Class "denseMatrix" [package "Matrix"]
##
## Slots:
##
## Name: Dim Dimnames
## Class: integer list
##
## Extends:
## Class "Matrix", directly
## Class "mMatrix", by class "Matrix", distance 2
## Class "replValueSp", by class "Matrix", distance 2
##
## Known Subclasses:
## Class "unpackedMatrix", directly
## Class "packedMatrix", directly
## Class "ndenseMatrix", directly
## Class "ldenseMatrix", directly
## Class "ddenseMatrix", directly
## Class "ngeMatrix", by class "unpackedMatrix", distance 2
## Class "ntrMatrix", by class "unpackedMatrix", distance 2
## Class "nsyMatrix", by class "unpackedMatrix", distance 2
## Class "ntpMatrix", by class "packedMatrix", distance 2
## Class "nspMatrix", by class "packedMatrix", distance 2
## Class "lgeMatrix", by class "unpackedMatrix", distance 2
## Class "ltrMatrix", by class "unpackedMatrix", distance 2
## Class "lsyMatrix", by class "unpackedMatrix", distance 2
## Class "ltpMatrix", by class "packedMatrix", distance 2
## Class "lspMatrix", by class "packedMatrix", distance 2
## Class "dgeMatrix", by class "unpackedMatrix", distance 2
## Class "dtrMatrix", by class "unpackedMatrix", distance 2
## Class "dsyMatrix", by class "unpackedMatrix", distance 2
## Class "dpoMatrix", by class "dsyMatrix", distance 3
## Class "corMatrix", by class "dpoMatrix", distance 4
## Class "dtpMatrix", by class "packedMatrix", distance 2
## Class "dspMatrix", by class "packedMatrix", distance 2
## Class "dppMatrix", by class "dspMatrix", distance 3
## Class "pcorMatrix", by class "dppMatrix", distance 4
Below is an example of a dense matrix from the Matrix
package:
(dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
## 5 x 2 Matrix of class "dgeMatrix"
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
str(dense_matrix)
## Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
## ..@ Dim : int [1:2] 5 2
## ..@ Dimnames:List of 2
## .. ..$ : NULL
## .. ..$ : NULL
## ..@ x : num [1:10] 1 2 3 4 5 6 7 8 9 10
## ..@ factors : list()
Statement of need
The README file contains a section titled βWhy?β, but it does not state explicitly what problems the software is designed to solved nor what the target audience is.
Vignettes
There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.
Examples
I understand that the fmm_to_*
functions need an mtx file to work, and I guess itβs therefore the examples with these functions are marked with \dontrun{}
. However, wouldnβt it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_*
in order to have an example that can actually be run?
Just a quick clarification, @osorensen, the readme
wording is a bit unclear but it states in the previous section that the goal is the reading and writing of Matrix Market
files. For that, it is true that Matrix
does not support dense matrices:
> library("Matrix")
> (dense_matrix <- as(matrix(1:10, ncol = 2), "dMatrix"))
> writeMM(dense_matrix, "mat.mtx")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function βwriteMMβ for signature β"dgeMatrix"β
> ?writeMM
...
Read matrices stored in the Harwell-Boeing or MatrixMarket formats
or write βsparseMatrixβ objects to one of these formats.
...
The longer explanation is part of the ROpenSci issue (intended audience, use case) but I will add it to the readme soon.
P.S. Thanks for accepting the review and getting started with this :)
Thank you @osorensen!!
@ropensci-review-bot submit review #606 (comment) time 2
Logged review for osorensen (hours: 2)
π @czeildi you have 2 days left before the due date for your review (2023-10-07).
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s): demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions
- Examples: (that run successfully locally) for all exported functions
- Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 5
- Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
All in all, I could easily install the package, understand its functionality
and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).
I have a couple of comments, detailed below. Most of them are non-blocking and
might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)
Functionality
- Performance: I did not re-run the benchmark vignettes myself but tried the package
out with some moderately large matrices and it felt reasonably fast. ππΌ - it seems the lib cannot handle paths with
~
(?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
(I am guessing the issue is related to~
in path becausewrite_fmm(vec, "~/vector.mtx")
returned false for me whilewrite_fmm(vec, "/home/ildi_home/vector.mtx")
worked) - I really appreciated the precise error message in
vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'
However, I was wondering if it could actually accept integers as well?
- improve error message? It is a matrix, just sparse, not plain
mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix
- improve error message?
mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'
- future suggestion: support reading
.mtx.gz
files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ? - consider increased consistency of function names? e.g
fmm_to_sparseMatrix
vssparse_to_fmm
.sparse_to_fmm
was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, whilesparse_matrix
is not. A vague suggestion: possibly follow the pattern offmm_write_vector
,fmm_read_to_vector
for all formats?
Documentation
- I did not tick function documentation because although they exist, argument names
(fname
vsfilename
) are not always consistent between the documentation and the code
itself - I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write
functions then the examples of the read functions run correctly but they are not self contained - readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although
I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal - In the readme: "we support / the package supports". Consider unifying these two wording styles
- rename
basic_usage
to getting started? The vignette title is getting started but you can seebasic_usage
in the url - clarify what the
addendum
is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling ofNA
byMatrix
package? - Maybe add a bit more context on why would I want to read back in
Python
? Possibly rather focus on being a language agnostic format? - consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette
- Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?
- In several function documentation you write "This function has no return value." This is technically not true, all functions return
NULL
if they have no other return value
but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions? write_fmm
example:sparse_mat
variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included- readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette."
Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen? - I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability
Code
- I see that R CMD check seem to give a note about pixi files, it seems
pixi.*
might not
be the correct syntax in.Rbuildignore
? (I do not know what the correct syntax would be,
but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works) - package version in
DESCRIPTION
does not match package version innews
, is this intended? - Ideally, tests would not write to the project root in my opinion even if those files are added
to.gitignore
, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution. - the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any
test_that
block in itself and it is a working test (no setup code beforetest_that
blocks ideally) - I think there is no need to add
library(testthat)
to test files - consistency of indentation styles could be a bit improved, consider running
styler::style_pkg
, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34 - this is really overly pedantic, but you could consider specifying which rule to ignore in the
#nolint
comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules - pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?
- where is this included from? I only see
cpp11.cpp
https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10
My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages - commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)
- what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30
- possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.
@ropensci-review-bot submit review #606 (comment) time 5
Logged review for czeildi (hours: 5)
@HaoZeke Now both reviews are in, as a reminder see https://devdevguide.netlify.app/softwarereview_author#the-review-process
@HaoZeke: please post your response with @ropensci-review-bot submit response <url to issue comment>
if you haven't done so already (this is an automatic reminder).
Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html
Sorry for the delay. Thank you reviewers and editors. I've made changes, in accordance to the suggestions and comments, and made a new release.
Response to @osorensen
- Dense matrices with the matrix package
Response in #606 (comment).
- Statement of need
The README file contains a section titled βWhy?β, but it does not state explicitly what problems the software is designed to solved nor what the target audience is
I have rewritten the readme to explain the context.
- Vignettes
There are vignettes, but they are very brief. Please provide some more extended examples, with more text explaining what is being done.
The functionality of the package is to read and write a particular file format, commonly used in data-science across many languages including R
. Apart from the dense matrix and vector support, the vignettes show performance improvements over the implementation in Matrix
.
- Examples
I understand that the fmm_to_* functions need an mtx file to work, and I guess itβs therefore the examples with these functions are marked with \dontrun{}. However, wouldnβt it possible to define a character vector in R that contains the required input, and then feed this to fmm_to_* in order to have an example that can actually be run?
The write_fmm
functions can be run locally, for saving R objects as shown, and then those can be used subsequently for loading. The speed increases are largely by using C++ to load data in an optimal manner so this isn't feasible. ropensci/fastMatMR#42
Response to @czeildi
All in all, I could easily install the package, understand its functionality
and use it in some of my (simple) own examples without having previous knowledge of matrix market. I could navigate the code and mostly understand it (I am not experienced with cpp11 code).
Glad to hear it :)
I have a couple of comments, detailed below. Most of them are non-blocking and
might be somewhat opinionated so feel free to apply them as you see fit (besides the ones where I did not tick the corresponding box)
These were all excellent, and I addressed each of them in the linked issues, I will briefly link to them again.
Performance: I did not re-run the benchmark vignettes myself but tried the package
out with some moderately large matrices and it felt reasonably fast. ππΌ
Glad to hear that. The benchmark vignettes are very large and not really meant to be run often (they can take upto 30-40 minutes).
it seems the lib cannot handle paths with ~ (?) If this cannot be made to work easily, some error message would be very helpful, I think it would be needed
to be added here: https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L26 The corresponding read file returns an error message: 'cannot open file' which is more helpful.
(I am guessing the issue is related to ~ in path because write_fmm(vec, "~/vector.mtx") returned false for me while write_fmm(vec, "/home/ildi_home/vector.mtx") worked)
This was a pretty involved feature, but I managed to get it in ropensci/fastMatMR#24.
I really appreciated the precise error message in
vec_to_fmm(1L, "vector.mtx")
#> Error: Invalid input type, expected 'double' actual 'integer'
However, I was wondering if it could actually accept integers as well?
Implemented in ropensci/fastMatMR#25.
improve error message? It is a matrix, just sparse, not plain
mat_to_fmm(sparse_mat, "sparse_matrix.mtx")
#> Error in mat_to_fmm(sparse_mat, "sparse_matrix.mtx") : object is not a matrix
As noted in ropensci/fastMatMR#26, users should be calling write_fmm
instead of the lower level functions.
improve error message?
mat <- matrix(c(1, 2, 3, 4), nrow = 2)
sparse_to_fmm(mat, "matrix.mtx")
#> Error: Invalid input type, expected 'double' actual 'NULL'
Same as above (ropensci/fastMatMR#27), write_fmm
has the right user-facing semantics.
future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?
This one is tracked, but not for this release. It would need quite a bit more design and I'm not sure it will be in scope. Tracked here: ropensci/fastMatMR#28
consider increased consistency of function names? e.g fmm_to_sparseMatrix vs sparse_to_fmm. sparse_to_fmm was a bit confusing to me in that if I understand correctly, the package is 'fast', not the target format itself (?). matrix and vector is abbreviated to spare a few characters, while sparse_matrix is not. A vague suggestion: possibly follow the pattern of fmm_write_vector, fmm_read_to_vector for all formats?
As noted in ropensci/fastMatMR#29,
sparse_to_fmm has been converted to sparse_Matrix_to_fmm for consistency.
Documentation
I did not tick function documentation because although they exist, argument names
(fname vs filename) are not always consistent between the documentation and the code
itself
I have fixed this in ropensci/fastMatMR#30.
I did not tick Examples, because although the examples run, but only in the right order, i.e. if I first run the examples of the write
functions then the examples of the read functions run correctly but they are not self contained
Completed in ropensci/fastMatMR#42 and ropensci/fastMatMR#31. Essentially, while the write_fmm
can be self-contained, fmm_to_*
need to have matrix market files present.
readme: "Unlike the Matrix package" : maybe rephrase? it is a bit off-putting for me to start the whole readme with a negative comparison, although
I understand that a core part of the motivation for this package is that Matrix does not support everything and not fast enough. Even changing the order of the two parts of the sentence would feel better for me but this is really personal
In the readme: "we support / the package supports". Consider unifying these two wording styles
Completed in ropensci/fastMatMR#32 and ropensci/fastMatMR#33. Along with the suggestions from the other reviewer, the readme
has been more or less completely re-written, and I hope it is clearer now.
rename basic_usage to getting started? The vignette title is getting started but you can see basic_usage in the url
Completed in ropensci/fastMatMR#34.
clarify what the addendum is about in the getting started readme? If I understand correctly, it demonstrates the incorrect handling of NA by Matrix package?
Completed in ropensci/fastMatMR#35.
Maybe add a bit more context on why would I want to read back in Python? Possibly rather focus on being a language agnostic format?
Completed in ropensci/fastMatMR#36.
consider moving the vignette summary to the beginning/remove it? It felt to me like repeating basically the same information as the beginning of the vignette
Completed in ropensci/fastMatMR#37.
Did you try out logarithmic scale for the benchmark plots? Would they possibly be more readable?
Yes. All scales were considered, the plots use a mix of log-log and semi-log to show the trends, and all of them are interpreted in the text as well. Completed in ropensci/fastMatMR#38.
In several function documentation you write "This function has no return value." This is technically not true, all functions return NULL if they have no other return value
but rather these functions are not called for their return value, but for their side effect? Maybe see how other packages handle write functions?
Completed in ropensci/fastMatMR#39.
write_fmm example: sparse_mat variable is overwritten before it is written to disk so it is not easy to actually run both of the two variants included
Completed in ropensci/fastMatMR#40.
readme sentence: "Similarly, we support writing and reading from other R objects (e.g. standard R vectors and matrices), as seen in the basic use vignette." Consider rephrasing a bit, e.g. refer to 'getting started' vignette instead of basic use, possibly as described in the vignette / as you can see in the vignette instead of as seen?
Rewritten slightly for the new readme.
I am not sure how, but I think there is a relatively easy way to have the getting started vignette show up as a separate menu item in the packagedown website and not just among articles, that could improve its discoverability
Completed in ropensci/fastMatMR#41.
Code
I see that R CMD check seem to give a note about pixi files, it seems pixi.* might not
be the correct syntax in .Rbuildignore? (I do not know what the correct syntax would be,
but the check gives a note both in github actions and locally it seems. If all else fails, adding the files one by one instead of a wildcard syntax definitely works)
package version in DESCRIPTION does not match package version in news, is this intended?
Completed in ropensci/fastMatMR#43.
Ideally, tests would not write to the project root in my opinion even if those files are added
to .gitignore, but rather write to a temp directory. https://testthat.r-lib.org/articles/test-fixtures.html might help find a more robust solution.
the readability of read unit tests could be improved if all tests are self contained and independent of each other: e.g. you can run any test_that block in itself and it is a working test (no setup code before test_that blocks ideally)
Completed in ropensci/fastMatMR#44.
I think there is no need to add library(testthat) to test files
Completed in ropensci/fastMatMR#45.
consistency of indentation styles could be a bit improved, consider running styler::style_pkg, see for example https://github.com/HaoZeke/fastMatMR/blob/main/tests/testthat/test-write_fmm.R#L34
These are run already, but the tests were refactored so I think it should be OK now. Completed in ropensci/fastMatMR#45.
this is really overly pedantic, but you could consider specifying which rule to ignore in the #nolint comments instead of ignoring every rule (it is considered generally a best practice, but does not have real impact here). This is not needed if you would need to ignore several rules
Since these are C++ functions, the linting is basically for everything. Completed in ropensci/fastMatMR#51.
pre commit github workflow name is a bit misleading to me, I understand you could add the same functionality as a pre commit hook, but a github action is a post commit/push thing, maybe title it after what it does, e.g. style checking?
Completed in ropensci/fastMatMR#47.
where is this included from? I only see cpp11.cpp https://github.com/HaoZeke/fastMatMR/blob/main/src/from_file.cpp#L10
My IDE underlined this with red, but it might be fine, I am not familiar with cpp bindings in R packages
This is loaded in as part of the cpp11
package. It is found by R
by using the following directive in the DESCRIPTION
(as suggested by the documentation):
LinkingTo:
cpp11
There really isn't a good way to get an IDE to be aware of the R installed package headers. cpp11.cpp
is actually auto-generated :)
Completed in ropensci/fastMatMR#46.
commented out code should be removed (https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L20)
Done in ropensci/fastMatMR#48.
what happens if cpp lib call fails for some reason? Will the file handle be closed properly? See https://github.com/HaoZeke/fastMatMR/blob/main/src/to_file.cpp#L30
If os.open()
fails the function will return FALSE
, and nothing else needs to be done, completed in ropensci/fastMatMR#49.
possibly move rebuild-benchmarks to a tools sub directory or similar? https://r-pkgs.org/misc.html#sec-misc-tools.
Completed in ropensci/fastMatMR#50.
============================
Thank you all for the great comments!
@ropensci-review-bot submit response #606 (comment)
Logged author response!
@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)
I still have two small comments (neither blocking approval):
- I think there is a typo in .Rbuildignore, I submitted a PR to fix: ropensci/fastMatMR#63
As noted in ropensci/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.
If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.
@HaoZeke great work and thank you for your detailed response and making your changes easy to track. I am now happy to recommend approving this package :)
Wonderful, thanks so much ^_^
I still have two small comments (neither blocking approval):
- I think there is a typo in .Rbuildignore, I submitted a PR to fix: Fix typo in .Rbuildignore HaoZeke/fastMatMR#63
Merging soon :)
As noted in HaoZeke/fastMatMR#26, users should be calling write_fmm instead of the lower level functions.
If the users are not supposed to be calling the lower level functions, do they need to be exported, and thus being part of the public user interface of the package? It could be less confusing possibly if only those functions are exported which are meant to be called directly by users.
I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via :::
I thought about this, but decided not to mainly because downstream package users (including me for other dependent packages) would want to call these functions where it makes sense (e.g. if my downstream package will only write matrix data or sparse matrix data) and CRAN disallows / warns calling unexported functions via
:::
Makes sense, thank you for the response!
@maelle since the reviews are satisfactorily completed (thanks all) I was wondering whta the next steps are :)
π @HaoZeke! Thanks for your work!
Thanks @czeildi @osorensen! Could please use the reviewer approval template for approval? Thanks!
Reviewer Response
Final approval (post-review)
- The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 3
Reviewer Response
Final approval (post-review)
- The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 6
@ropensci-review-bot approve fastMatMR
Approved! Thanks @HaoZeke for submitting and @osorensen, @czeildi for your reviews! π
To-dos:
- Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account.
This invitation will expire after one week. If it happens write a comment@ropensci-review-bot invite me to ropensci/<package-name>
which will re-send an invitation. - After transfer write a comment
@ropensci-review-bot finalize transfer of <package-name>
where<package-name>
is the repo/package name. This will give you admin access back. - Fix all links to the GitHub repo to point to the repo under the ropensci organization.
- Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
- If you already had a
pkgdown
website and are ok relying only on rOpenSci central docs building and branding,- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current
pkgdown
website with a redirecting page - replace your package docs URL with
https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the
URL
field alongside the link to the GitHub repository, e.g.:URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
- Skim the docs of the pkgdown automatic deployment, in particular if your website needs MathJax.
- Fix any links in badges for CI and coverage to point to the new repository URL.
- Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
- We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running
codemetar::write_codemeta()
in the root of your package. - You can add this installation method to your package README
install.packages("<package-name>", repos = "https://ropensci.r-universe.dev")
thanks to R-universe.
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent).
Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.
We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.
Last but not least, you can volunteer as a reviewer via filling a short form.
@ropensci-review-bot submit review #606 (comment) time 3
Logged review for osorensen (hours: 3)
@ropensci-review-bot submit review #606 (comment) time 6
Logged review for czeildi (hours: 6)
@ropensci-review-bot finalize transfer of fastMatMR
Transfer completed.
The fastMatMR
team is now owner of the repository and the author has been invited to the team
Sorry to bring this up here, but, I can't seem to get the centralized documentation and r-universe builds to run, @maelle would you have any suggestions?
- I moved the repo
- Got back admin access
- Pushed a few commits changing links
- I thought at this stage a build would run on r-universe based on the documentation
- It doesn't seem to have a job defined (https://dev.ropensci.org/fastMatMR)
- Nor is it listed in the r-universe channel (not sure if that's the right terminology): https://ropensci.r-universe.dev/builds
- Removed my customized
pkgdown
stuff- This also didn't seem to trigger a build from the centralized documentation server, and the link which should go to the package hits a 404 (https://docs.ropensci.org/fastMatMR)
P.S. Thanks all for the help, this was a really useful process :)
Also perhaps @mpadge would be better suited to offer advice on #606 (comment)?
Also @ropensci/blog-editors, I think a blog post showing the enhanced inter-operability offered by this package might be of interest, if so, I'd be happy to discuss submitting a blog post :)
@HaoZeke r-universe only builds at most every few hours, so my advice would be just wait and build should appear
https://docs.ropensci.org/fastMatMR/ is up π
The dev version of the dev guide has a package maintainer cheatsheet: https://devdevguide.netlify.app/maintenance_cheatsheet
Thanks so much @HaoZeke for your participation in the process!
wow that was fast, congrats!!