EDIutils

Question

EDIutils

clnsmth opened this issue 3 years ago · 60 comments

Date accepted: 2022-03-29
Reviewers:

Submitting Author Name: Colin Smith
Submitting Author Github Handle: @clnsmth
Repository: https://github.com/EDIorg/EDIutils
Version submitted:
Submission type: Standard
Editor: @adamhsparks
Reviewers: @bozaah, @laijasmine

Due date for @bozaah: 2022-02-25

Due date for @laijasmine: 2022-02-25
Archive: TBD
Version accepted: TBD
Language: en

Package: EDIutils
Title: An API Client for the Environmental Data Initiative Repository
Version: 0.0.0.9000
Authors@R: 
    c(person("Colin", "Smith", email = "colin.smith@wisc.edu", role = c("aut", "cre"), comment = "https://orcid.org/0000-0003-2261-9931"),
    person("Corinna", "Gries", role = "ctb", comment = "0000-0002-9091-6543"))
Description: A client for the Environmental Data Initiative repository REST API. The EDI data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the PASTA+ software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US LTER Network <https://lternet.edu/>. EDIutils includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
Imports: 
    curl,
    httr,
    xml2
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Suggests: 
    knitr,
    readr,
    rmarkdown,
    roxygen2,
    testthat
URL: https://github.com/EDIorg/EDIutils, https://ediorg.github.io/EDIutils/
BugReports: https://github.com/EDIorg/EDIutils/issues
VignetteBuilder: knitr
Language: en-US

Scope

- [x ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ x] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis

Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package is an API client for the Environmental Data Initiative data repository, which serves the ecological and environmental science and monitoring domains.

Who is the target audience and what are scientific applications of this package?

The target audiences are:
1.) Data authors who publish to the repository.
2.) Data users who retrieve data from the repository.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

There is no overlap.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

All checks pass

Technical checks

Confirm each of the following by checking the box.

[x ] I have read the guide for authors and rOpenSci packaging guide.

This package:

[ x] does not violate the Terms of Service of any service it interacts with.
[ x] has a CRAN and OSI accepted license.
[ x] contains a README with instructions for installing the development version.
[ x] includes documentation with examples for all functions, created with roxygen2.
[ x] contains a vignette with examples of its essential functions and uses.
[ x] has a test suite.
[ x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[ x] Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

[ x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Answer 1 · 2022-01-11T21:47:16.000Z

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Answer 2 · 2022-01-11T21:47:19.000Z

🚀

Editor check started

👋

Answer 3 · 2022-01-11T21:50:34.000Z

Checks for EDIutils (v0.0.0.9000)

git hash: 70932fd3

✔️ Package name is available
✔️ has a 'CITATION' file.
✔️ has a 'codemeta.json' file.
✔️ has a 'contributing' file.
✔️ uses 'roxygen2'.
✔️ 'DESCRIPTION' has a URL field.
✔️ 'DESCRIPTION' has a BugReports field.
✔️ Package has at least one HTML vignette
✔️ All functions have examples.
✔️ Package has continuous integration checks.
✖️ Package coverage is 55.6% (should be at least 75%).
✔️ R CMD check found no errors.
✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 72 files) and
1 authors
5 vignettes
no internal data file
3 imported packages
71 exported functions (median 9 lines of code)
100 non-exported functions in R (median 11 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	72	98.1
files_vignettes	5	96.9
files_tests	72	99.6
loc_R	1178	72.3
loc_vignettes	383	71.2
loc_tests	659	80.2
num_vignettes	5	97.9	TRUE
n_fns_r	171	87.3
n_fns_r_exported	71	92.3
n_fns_r_not_exported	100	83.8
n_fns_per_file_r	1	19.0
num_params_per_fn	2	11.9
loc_per_fn_r	10	28.4
loc_per_fn_r_exp	9	19.2
loc_per_fn_r_not_exp	11	35.4
rel_whitespace_R	9	53.4
rel_whitespace_vignettes	38	76.3
rel_whitespace_tests	13	68.9
doclines_per_fn_exp	38	47.0
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	230	90.0

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

name	conclusion	sha	date
pages build and deployment	success	70932f	2022-01-11
pkgdown	success	70932f	2022-01-11
R-CMD-check		70932f	2022-01-11
test-coverage	success	70932f	2022-01-11

3b. `goodpractice` results

`R CMD check` with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 55.63

The following files are not completely covered by tests:

file	coverage
R/check_status_create.R	0%
R/check_status_evaluate.R	0%
R/check_status_update.R	0%
R/create_data_package.R	0%
R/create_event_subscription.R	0%
R/create_journal_citation.R	0%
R/create_reservation.R	0%
R/delete_event_subscription.R	0%
R/delete_journal_citation.R	0%
R/delete_reservation.R	0%
R/evaluate_data_package.R	0%
R/execute_event_subscription.R	0%
R/get_audit_count.R	0%
R/get_audit_record.R	0%
R/get_audit_report.R	0%
R/get_event_subscription.R	0%
R/is_authorized.R	0%
R/login.R	44%
R/logout.R	0%
R/query_event_subscriptions.R	0%
R/read_data_package_error.R	0%
R/read_evaluate_report_summary.R	0%
R/read_evaluate_report.R	0%
R/update_data_package.R	0%

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 26 potential issues:

message	number of times
Lines should not be more than 80 characters.	26

Package Versions

package	version
pkgstats	0.0.3.82
pkgcheck	0.0.2.205

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

Answer 4 · 2022-01-14T13:07:52.000Z

Hi @clnsmth ! Thanks for the submission. The scope of the package looks like a good fit for rOpenSci. I'm going to assign an editor.

As for the test coverage, have a look here:
https://books.ropensci.org/http-testing/
And see if there are some techniques you could use to bump up the coverage.

Thanks for the submission!

@ropensci-review-bot assign @adamhsparks as editor

Answer 5 · 2022-01-14T13:08:13.000Z

@ropensci-review-bot assign @adamhsparks as editor

Answer 6 · 2022-01-14T13:08:15.000Z

Assigned! @adamhsparks is now the editor

Answer 7 · 2022-01-14T14:30:11.000Z

Hi @ldecicco-USGS and @adamhsparks ! Thanks for the http-testing resource. I'll send an update once the requested changes are implemented.

Answer 8 · 2022-01-17T05:50:58.000Z

Thanks, @clnsmth, just ping me when you're ready and I'll start my editor checks and look for reviewers when everything looks good here

Answer 9 · 2022-01-21T15:05:13.000Z

Hi @adamhsparks @ldecicco-USGS,

I've implement rOpenSci http testing recommendations and coverage is now > 75%.

EDIutils:

Uses vcr to mock http requests
Runs tests with real http requests via GitHub Actions on a monthly schedule and on push and pull requests to the main branch
Precomputes all examples and vignettes

Thanks again for this helpful advice and please let me know if I've missed anything.

Answer 10 · 2022-01-22T02:15:09.000Z

@ropensci-review-bot check package

Answer 11 · 2022-01-22T02:15:10.000Z

Thanks, about to send the query.

Answer 12 · 2022-01-22T02:15:12.000Z

🚀

Editor check started

👋

Answer 13 · 2022-01-22T02:18:51.000Z

Checks for EDIutils (v0.0.0.9001)

git hash: d80e7bbb

✔️ Package name is available
✔️ has a 'CITATION' file.
✔️ has a 'codemeta.json' file.
✔️ has a 'contributing' file.
✔️ uses 'roxygen2'.
✔️ 'DESCRIPTION' has a URL field.
✔️ 'DESCRIPTION' has a BugReports field.
✔️ Package has at least one HTML vignette
✔️ All functions have examples.
✔️ Package has continuous integration checks.
✔️ Package coverage is 77.5%.
✔️ R CMD check found no errors.
✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 72 files) and
1 authors
5 vignettes
no internal data file
3 imported packages
71 exported functions (median 9 lines of code)
100 non-exported functions in R (median 11 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	72	98.1
files_vignettes	5	96.9
files_tests	146	99.9
loc_R	1179	72.4
loc_vignettes	383	71.2
loc_tests	5333	98.4	TRUE
num_vignettes	5	97.9	TRUE
n_fns_r	171	87.3
n_fns_r_exported	71	92.3
n_fns_r_not_exported	100	83.8
n_fns_per_file_r	1	19.0
num_params_per_fn	2	11.9
loc_per_fn_r	10	28.4
loc_per_fn_r_exp	9	19.2
loc_per_fn_r_not_exp	11	35.4
rel_whitespace_R	9	53.6
rel_whitespace_vignettes	38	76.3
rel_whitespace_tests	2	73.5
doclines_per_fn_exp	47	59.4
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	230	90.0

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

name	conclusion	sha	date
pages build and deployment	success	ea61c2	2022-01-21
pkgdown	success	d80e7b	2022-01-21
R-CMD-check	success	971e12	2022-01-21
R-CMD-check-real-requests	success	d80e7b	2022-01-21
test-coverage	cancelled	6114a2	2022-01-21

3b. `goodpractice` results

`R CMD check` with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 77.51

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 3 potential issues:

message	number of times
Lines should not be more than 80 characters.	3

Package Versions

package	version
pkgstats	0.0.3.86
pkgcheck	0.0.2.207

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

Answer 14 · 2022-01-22T02:21:06.000Z

Editor checks:

Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- Is the case for the package well made?
- Is the reference index page clear (grouped by topic if necessary)?
- Are vignettes readable, sufficiently detailed and not just perfunctory?
Fit: The package meets criteria for fit and overlap.
Installation instructions: Are installation instructions clear enough for human users?
Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
License: The package has a CRAN or OSI accepted license.
Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Thanks for addressing the issues that were flagged by @ldecicco-USGS. This looks great, thanks @clnsmth. I'll start looking for reviewers now.

Answer 15 · 2022-01-22T05:18:58.000Z

@ropensci-review-bot seeking reviewers

Answer 16 · 2022-01-22T05:18:59.000Z

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/498_status.svg)](https://github.com/ropensci/software-review/issues/498)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

Answer 17 · 2022-02-04T03:10:54.000Z

@ropensci-review-bot add @bozaah to reviewers

Answer 18 · 2022-02-04T03:10:56.000Z

@bozaah added to the reviewers list. Review due date is 2022-02-25. Thanks @bozaah for accepting to review! Please refer to our reviewer guide.

Answer 19 · 2022-02-04T03:10:58.000Z

@bozaah: If you haven't done so, please fill this form for us to update our reviewers records.

Answer 20 · 2022-02-04T03:12:40.000Z

@ropensci-review-bot add @laijasmine to reviewers

Answer 21 · 2022-02-04T03:12:41.000Z

@laijasmine added to the reviewers list. Review due date is 2022-02-25. Thanks @laijasmine for accepting to review! Please refer to our reviewer guide.

Answer 22 · 2022-02-04T03:12:42.000Z

@laijasmine: If you haven't done so, please fill this form for us to update our reviewers records.

Answer 23 · 2022-02-14T06:41:12.000Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 8

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thanks for the opportunity to be involved in this! It was interesting to see how other data repositories deal with data upload. I also wanted to preface this by saying my experience is heavily influenced by data repository access and submission package, dataone and working on editing EML using the EML package. The workflow there is very similar but different in some elements so some of the things I might not be applicable to your situation!

Vignettes

Search and Access

Many users are probably unfamiliar with xml and how to navigate it and there is a learning curve involved so a vignette on the files would be helpful. They might be more comfortable with working with dataframes instead. We don't need to go into detail but it might be helpful to point them to packages like XML to help convert the content to a preferred format. Maybe adding a couple lines or point them to retrieve downloads will be helpful.

Evaluate and upload

Upload a data package
I struggled the longest here. Maybe because I was just trying to write the least amount of information possible to just upload the file. I had a hard time understanding what the errors were in the results of read_evaluate_report. I went back and forth between the ezEML form and R trying to figure out what I needed. There seems to be more requirements to upload using EDIutils compared to the Check Metadata in ezEML. For example I tried to avoid uploading a data table because it wasn't marked as required in ezEML but it was required to upload using EDIutils.

Report
I also found the output from read_evaluate_reportquite long and hard to read if I'm viewing it as a message in the console. We might want to suggest users to use writeLines(report, "report.txt") instead so they can cmd + f the file and keep track of their fixes. Users might prefer if there is a way to only show the warnings and errors so there is less to look through when trying to improve their metadata.

Update a data package
Following the example in the vignette I wasn't able to update my package because I keep getting the error:

Error: Attempting to update a data package to revision '1' but an equal or higher revision ('1') already exists in PASTA: edi.757.1

I realized the packageId slot inside the eml file also needs to be updated. We might need to be more explicit in the vignette that the identifier needs to be updated. I used the following EML package here:

my_eml <- EML::read_eml("edi.757.1.xml")
my_eml$dataset$title <- "test 2 update the title :)"
my_eml$packageId <- "edi.757.2"
EML::write_eml(my_eml, "edi.757.2.xml")

Documentation

list_principal_owner_citations

in the documentation it mentions the principalOwner as being:

(character) Principal owner in the format returned by construct_dn()

I can't seem to find that function. I think this should say create_dn()?

search_data_package

It might be helpful for the documentation to have the formats for some of the fields such as singledate, begindate and pubdate so it can be easily referenced.

create_journal_citation

using the example (and delete_journal_citation example) and got the following error:

journalCitationId <- create_journal_citation(
  packageId = "edi.17.1",
  articleDoi = "https://doi.org/10.1890/11-1026.1",
  articleTitle = "Corridors promote fire via connectivity and edge effects",
  journalTitle = "Ecological Applications",
  relationType = "IsCitedBy",
  env = "staging"
)
# Error in create_journal_citation(packageId = "edi.17.1", articleDoi = "https://doi.org/10.1890/11-1026.1",  : 
#   Bad Request (HTTP 400). Failed to DOI https://doi.org/10.1890/11-1026.1 should be in "prefix/suffix" format (e.g., 10.6073/d4f26a6c).

I think the articleDOI in the example should be "10.1890/11-1026.1".

Answer 24 · 2022-02-14T10:25:40.000Z

@laijasmine, thank you for your comprehensive review.

@clnsmth, you can choose to start working on the items that were raised in this review or wait for @bozaah’s review as well before starting.

@bozaah, there is no rush here. The review isn’t due until 2022/02/25, please feel free to take your time.

Answer 25 · 2022-02-14T16:44:49.000Z

Many thanks for this helpful review @laijasmine!

@adamhsparks, @bozaah, I'll wait for the second review before beginning revisions.

Answer 26 · 2022-02-23T07:59:09.000Z

Package Review

As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

No current or past work/personal relationship to report.

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: ~9h

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

General Comments

Thank you for the opportunity to review the EDIutils package. This package provides an API client to access the Environmental Data Initiative Repository, in addition to the web browser portal (EDI). It consists of a suite of functions to browse/list, access, evaluate and upload data packages as well as manage citations entries and usage metrics associated with the data packages.

I am reviewing the package using R version 4.1.2 (2021-11-01) and platform x86_64-apple-darwin17.0 (64-bit). No issues reported when running devtools::check(). When running goodpractice::gp() no errors/issues however unit testing warns that 77% of code lines are covered by test cases.

Vignettes

Overall, vignettes are well structured, complete and cover the general use and applicability of the package functions.

Search and Access

No issues with the search and access vignettes. I also mirror comments from @laijasmine regarding XML which, I am not too familiar with and, it is likely the reality of many potential users of the package. There is already a note on Retrieve Download Metrics section which mention that several functions of the EDI API return XML and that xml2 is a reasonable approach to locally manipulate (eg extract and save) EML files.

Several functions do not work until the user has completed the authentication step with login() which requires a EDI account. The vignette Tests Requiring Authentication has information about it but perhaps a vignette on authentication only could be a reasonable addition. If so, I believe it should placed as the first vignette or together with a Get started vignette.

Lastly, mirroring comments of @laijasmine regarding the doi formats, the function read_data_package_doi() returns only the doi identifier eg doi:10.6073/pasta/5d2c5ac4d6cb1293a47cfa6ef5c55a01 but it could perhaps return the complete URL ie https://doi.org/10.6073/pasta/5d2c5ac4d6cb1293a47cfa6ef5c55a01

Evaluate and Upload

No issues here but I also agree with comments about the output of the read_evalute_report() function. However, the function description/example does present @laijasmine's suggestion to use writeLines() and also mention the possibility of alternative output formats with the frmt argument (defaults to xml). I also noted that the read_evalute_report_summary() offers a compact way to visualise the report output, which is quite neat.

Retrieve Download and Citation metrics

These vignettes are well structure and include reasonable examples and potential user cases. Functions and arguments are descriptive and detailed. No errors or issues to report for those vignettes, with exception of the citation doi error in the examples which were already mentioned by @laijasmine.

Documentation and examples

Documentation is automatically generated with roxygen2 and devtools::document(). Functionally is descriptive and complete. R code follows community guidelines (ie tidyverse style guide and rOpenSci package guide). The author also included contribution guidelines separately from the README. Examples run without errors.

Final Remarks

Access to data repositories with accurate and structured/complete metadata is essential for research reproducibility and general data reuse. The ability to obtain usage and citation metrics for the data packages in the EDI repository is outstading. EDIutils offers a great tool for the Open Source community and it will be a welcomed addition to the rOpenSci packages.

Answer 27 · 2022-02-24T07:29:27.000Z

@bozaah, thank you for your review.

Answer 28 · 2022-02-24T07:29:30.000Z

@ropensci-review-bot submit #498 (comment) time 8

Answer 29 · 2022-02-24T07:29:31.000Z

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

Answer 30 · 2022-02-24T07:30:01.000Z

@ropensci-review-bot submit review #498 (comment) time 8

Answer 31 · 2022-02-24T07:30:06.000Z

Logged review for laijasmine (hours: 8)

Answer 32 · 2022-02-24T07:30:23.000Z

@ropensci-review-bot submit review #498 (comment) time 9

Answer 33 · 2022-02-24T07:30:25.000Z

Logged review for bozaah (hours: 9)

Answer 34 · 2022-02-24T15:38:20.000Z

Many thanks for these comments @bozaah!

@adamhsparks, I'll begin responding to reviews next week.

Answer 35 · 2022-03-04T18:49:43.000Z

Thanks again for these helpful reviews @laijasmine @bozaah! I agree with all of your recommendations and will begin implementing them in the development branch of the EDIutils GitHub. The responses and links to corresponding commits will be listed here in this issue thread, and the changes should be completed by the end of Tuesday.

Answer 36 · 2022-03-09T09:11:26.000Z

Many thanks @laijasmine and @bozaah for your comments and suggested improvments to EDIutils! I've implemented your recommendations and responded to your comments below. Additional comments and suggestions are much appreciated.

Changes to EDIutils since your review:

Added support for the EDI "cite" service. This provides a way for users to get a citation of a data package in the EDI repository in a community recommended format. See EDIorg/EDIutils@25e1da7.
Update audit report and record examples. The create and get audit record methods now return the userAgent in results. They were previously returning NULL values. This was a change in the underlying EDI Data Repository Software.

Response to @laijasmine

You have been added to the EDIutils package DESCRIPTION as a reviewer (see EDIorg/EDIutils@9f3ace8). Please check the accuracy of this information and let me know if anything should be added/changed.

Vignettes

Search and Access

Many users are probably unfamiliar with xml and how to navigate it and there is a learning curve involved so a vignette on the files would be helpful. They might be more comfortable with working with dataframes instead. We don't need to go into detail but it might be helpful to point them to packages like XML to help convert the content to a preferred format. Maybe adding a couple lines or point them to retrieve downloads will be helpful.

I totally agree. The learning curve involved with XML can be off-putting and an unnecessary barrier.

Functions that returned a simple XML structure have a new as parameter through which the return type can be controlled with data.frame as the default (see EDIorg/EDIutils@82f38bc and EDIorg/EDIutils@0441ba2). Unfortunately these simplifications can't be made for more complex returns (e.g. EML) so in these cases a note has been added to the associated function metadata and vignettes pointing users to other resources for more on parsing (see EDIorg/EDIutils@26f888a).

The recommendation for a vignette on working with more complex XML is definitely well recieved, and I would be happy to create it if given a little more time. Until then, I've opened an issue in the project GitHub for this recommendation (see https://github.com/EDIorg/EDIutils/issues/34).

Evaluate and upload

Upload a data package

I struggled the longest here. Maybe because I was just trying to write the least amount of information possible to just upload the file. I had a hard time understanding what the errors were in the results of read_evaluate_report. I went back and forth between the ezEML form and R trying to figure out what I needed. There seems to be more requirements to upload using EDIutils compared to the Check Metadata in ezEML. For example I tried to avoid uploading a data table because it wasn't marked as required in ezEML but it was required to upload using EDIutils.

My apologies for the troubles. This is clearly a short comming in the documentation and possibly the report itself.

A new section on "Interpreting the Evaluation Report" has been added to the Evaluate and Upload Data vignette (see EDIorg/EDIutils@f4a0ab9). This provides more information on the report structure and content. Ultimately the report itself should be updated, it's a bit cryptic in places, but that is a task that will have to be addressed at the repository level.

Report

I also found the output from read_evaluate_report quite long and hard to read if I'm viewing it as a message in the console. We might want to suggest users to use writeLines(report, "report.txt") instead so they can cmd + f the file and keep track of their fixes. Users might prefer if there is a way to only show the warnings and errors so there is less to look through when trying to improve their metadata.

The suggestion for writing the report to file (for ease of viewing) was listed in the original examples of read_evaluate_report() but admittingly burried too deep to be noticed. Ooops. I've reordered the examples with the preferred approaches listed first (see EDIorg/EDIutils@6aaa552).

A suggestion to only return the warnings and errors would make a nice enhancement. I'll implement this if given a little more time. I've opened an issue in the project GitHub so this isn't lost (see https://github.com/EDIorg/EDIutils/issues/35).

Update a data package

Following the example in the vignette I wasn't able to update my package because I keep getting the error:
Error: Attempting to update a data package to revision '1' but an equal or higher revision ('1') already exists in PASTA: edi.757.1
I realized the packageId slot inside the eml file also needs to be updated. We might need to be more explicit in the vignette that the identifier needs to be updated. I used the following EML package here:
my_eml <- EML::read_eml("edi.757.1.xml")
my_eml$dataset$title <- "test 2 update the title :)"
my_eml$packageId <- "edi.757.2"
EML::write_eml(my_eml, "edi.757.2.xml")

Yes, definitely a nuance worth highlighting. I've added a note to the "Update" section of the "Evaluate and Upload" vignette. Additionally, I've elevated the "Update" section level to make visible in the vignette table of contents. See EDIorg/EDIutils@c1647e0.

Documentation

list_principal_owner_citations

in the documentation it mentions the principalOwner as being:
(character) Principal owner in the format returned by construct_dn()
I can't seem to find that function. I think this should say create_dn()?

Thanks for catching this. Fixed in EDIorg/EDIutils@082b2ab

search_data_package

It might be helpful for the documentation to have the formats for some of the fields such as singledate, begindate and pubdate so it can be easily referenced.

I completely agree and have added formats to the fields requiring one (see EDIorg/EDIutils@ef4b691). Relatedly, I've opened an issue requesting more examples demonstrating common search patterns and Solr syntax (see https://github.com/EDIorg/EDIutils/issues/36).

create_journal_citation

using the example (and delete_journal_citation example) and got the following error:

journalCitationId <- create_journal_citation(
 packageId = "edi.17.1",
 articleDoi = "https://doi.org/10.1890/11-1026.1",
 articleTitle = "Corridors promote fire via connectivity and edge effects",
 journalTitle = "Ecological Applications",
 relationType = "IsCitedBy",
 env = "staging"
)
# Error in create_journal_citation(packageId = "edi.17.1", articleDoi = "https://doi.org/10.1890/11-1026.1",  : 
#   Bad Request (HTTP 400). Failed to DOI https://doi.org/10.1890/11-1026.1 should be in "prefix/suffix" format (e.g., 10.6073/d4f26a6c).

Thanks for finding this. Fixed in EDIorg/EDIutils@743785c

Response to @bozaah

You have been added to the EDIutils package DESCRIPTION as a reviewer (see EDIorg/EDIutils@9f3ace8). Please check the accuracy of this information and let me know if anything should be added/changed.

General Comments

I am reviewing the package using R version 4.1.2 (2021-11-01) and platform x86_64-apple-darwin17.0 (64-bit). No issues reported when running devtools::check(). When running goodpractice::gp() no errors/issues however unit testing warns that 77% of code lines are covered by test cases.

Yes, this catches my eye too. 77% isn't terribly impressive, but it bumps up to 84% when authenticated, and higher when testing data package evalate, upload, and update operations). I'll stay on the look out for ways of improving the test suite.

Vignettes

Search and Access

No issues with the search and access vignettes. I also mirror comments from @laijasmine regarding XML which, I am not too familiar with and, it is likely the reality of many potential users of the package. There is already a note on Retrieve Download Metrics section which mention that several functions of the EDI API return XML and that xml2 is a reasonable approach to locally manipulate (eg extract and save) EML files.

Agreed. XML isn't much fun for infrequent users. Functions now have the option to return a data.frame instead of xml (in most cases). For details see my response to @laijasmine above.

Several functions do not work until the user has completed the authentication step with login() which requires a EDI account. The vignette Tests Requiring Authentication has information about it but perhaps a vignette on authentication only could be a reasonable addition. If so, I believe it should placed as the first vignette or together with a Get started vignette.

Others have requested help on this topic as well. It must need it. : )

A general statement about authentication and which types of functions require it, is now high up in the README (see EDIorg/EDIutils@bd49970). I'd be happy to write up a vignette on authentication if you think more is needed.

Lastly, mirroring comments of @laijasmine regarding the doi formats, the function read_data_package_doi() returns only the doi identifier eg doi:10.6073/pasta/5d2c5ac4d6cb1293a47cfa6ef5c55a01 but it could perhaps return the complete URL ie https://doi.org/10.6073/pasta/5d2c5ac4d6cb1293a47cfa6ef5c55a01

Nice! This function now returns the complete URL through the as_url parameter (see EDIorg/EDIutils@d34018c).

Evaluate and Upload

No issues here but I also agree with comments about the output of the read_evalute_report() function. However, the function description/example does present @laijasmine's suggestion to use writeLines() and also mention the possibility of alternative output formats with the frmt argument (defaults to xml). I also noted that the read_evalute_report_summary() offers a compact way to visualise the report output, which is quite neat.

A new section on "Interpreting the Evaluation Report" has been added to the Evaluate and Upload Data vignette (see EDIorg/EDIutils@f4a0ab9). This provides more information on the report structure and content.

Retrieve Download and Citation metrics

These vignettes are well structure and include reasonable examples and potential user cases. Functions and arguments are descriptive and detailed. No errors or issues to report for those vignettes, with exception of the citation doi error in the examples which were already mentioned by @laijasmine.

Thanks! The fix for the citation doi error is at EDIorg/EDIutils@743785c.

Answer 37 · 2022-03-10T04:30:31.000Z

Thank you for the detailed log of the changes that you've made @clnsmth. @laijasmine and @bozaah, when you have time could you please review the response and changes made and let me know if you're satisfied with the changes?

Answer 38 · 2022-03-10T07:32:20.000Z

Thank you @clnsmth for answering all my questions in detail! I'll be able to probably take a look in the next day or two.

Answer 39 · 2022-03-11T05:31:19.000Z

The description info is correct for me!

A couple minor revisions before I can approve. Now that some of the functions use the argument as we need to update the vignettes and Readme.

For this vignette, the default for get_audit_report has now changed to "data.frame" so we will need to update the code to specify as to be xml. Same goes forlist_data_package_citations in this vignette.
In both the readme and the vignette for read_evaluate_report the argument needs to be changed from frmt to as.

I think it is reasonable to leave the bigger changes (https://github.com/EDIorg/EDIutils/issues/34 and https://github.com/EDIorg/EDIutils/issues/35) as issues to be worked on in the future.

All the other changes looks good to me. Thank you for getting all those changes in!

Answer 40 · 2022-03-18T17:39:23.000Z

Thanks @laijasmine. I'll get these documents updated and link you to the changes once implemented.

Answer 41 · 2022-03-21T04:23:41.000Z

@bozaah, did you have any further comments or are you happy with the changes and suggest accepting the package as it's currently written?

Answer 42 · 2022-03-22T03:19:33.000Z

Thanks for following up Adam. I am satisfied with the replies and edits to my points.

Thank you for addressing our proposed changes @clnsmth. Also, I agree with @laijasmine, the bigger changes can be worked out in the future as issues.

Answer 43 · 2022-03-22T12:14:33.000Z

Thanks @bozaah.

@laijasmine I'll implement your requested changes shortly and post an update when completed.

Answer 44 · 2022-03-24T15:16:38.000Z

@adamhsparks @laijasmine @bozaah A bug was discovered when rebuilding the two vignettes "Retrieve Download Metrics" and "Retrieve Citation Metrics" (https://github.com/EDIorg/EDIutils/issues/37). I'll send an update once this issue is fixed.

Answer 45 · 2022-03-27T22:02:43.000Z

@adamhsparks, @laijasmine, @bozaah

The bug is fixed (see EDIorg/EDIutils#37)
@laijasmine 's requested changes (#498 (comment)) are are implemented
version 0.0.0.9003 has been tested on R-hub with results listed in cran-comments.md
All above changes are in the main branch of EDIutils

Answer 46 · 2022-03-29T01:11:19.000Z

Looks great! Thank you @clnsmth for all of your work. @adamhsparks I approve this package.

Answer 47 · 2022-03-29T02:08:04.000Z

@ropensci-review-bot approve EDIutils

Answer 48 · 2022-03-29T02:08:06.000Z

Approved! Thanks @clnsmth for submitting and @bozaah, @laijasmine for your reviews! 😁

To-dos:

Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
Fix all links to the GitHub repo to point to the repo under the ropensci organization.
Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
Fix any links in badges for CI and coverage to point to the new repository URL.
Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

Answer 49 · 2022-03-29T14:27:28.000Z

Many thanks again to @adamhsparks, @laijasmine, and @bozaah!

I'll get to work on the to-dos.

Answer 50 · 2022-04-20T15:38:24.000Z

My apologies for the inconvenience @adamhsparks, but can you resend the rOpenSci invitation so I can transfer the repo to the "ropensci" GitHub organization. Thanks!

Answer 51 · 2022-04-21T11:06:56.000Z

@adamhsparks @clnsmth I've just noticed the invitation expired, I've retried it. 😸

Answer 52 · 2022-04-21T11:08:00.000Z

Note that we now require 2 factor authentication for organization members, see https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication so you'll have to enable it before joining the organization (in case you haven't enabled it yet).

Answer 53 · 2022-04-21T14:36:40.000Z

Many thanks @maelle!

Answer 54 · 2022-04-21T15:18:44.000Z

For the record we're working on soon adding a bot command that authors could use to get a new invitation if they were not able to accept the first one in time: ropensci-org/buffy#67 😸

Answer 55 · 2022-04-23T15:41:32.000Z

@ropensci-review-bot finalize transfer of ropensci/EDIutils

Answer 56 · 2022-04-23T15:41:33.000Z

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

Answer 57 · 2022-04-23T15:43:01.000Z

@ropensci-review-bot finalize transfer of EDIorg/EDIutils

Answer 58 · 2022-04-23T15:43:02.000Z

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

Answer 59 · 2022-04-23T15:44:54.000Z

@ropensci-review-bot finalize transfer of EDIutils

Answer 60 · 2022-04-23T15:44:56.000Z

Transfer completed. The EDIutils team is now owner of the repository

Scope

Technical checks

Publication options

Code of conduct

Checks for EDIutils (v0.0.0.9000)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Editor-in-Chief Instructions:

Checks for EDIutils (v0.0.0.9001)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Editor-in-Chief Instructions:

Editor checks:

Editor comments

Package Review

Documentation

Functionality

Review Comments

Vignettes

Search and Access

Evaluate and upload

Documentation

list_principal_owner_citations

search_data_package

create_journal_citation

Package Review

Documentation

Functionality

Review Comments

General Comments

Vignettes

Search and Access

Evaluate and Upload

Retrieve Download and Citation metrics

Documentation and examples

Final Remarks

Response to @laijasmine

Vignettes

Search and Access

Evaluate and upload

Upload a data package

Report

Update a data package

Documentation

list_principal_owner_citations

search_data_package

create_journal_citation

Response to @bozaah

General Comments

Vignettes

Search and Access

Evaluate and Upload

Retrieve Download and Citation metrics

2. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck

2. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck