ropensci/software-review

bikedata

mpadge opened this issue ยท 37 comments

Summary

  • What does this package do? (explain in 50 words or less):

Download and aggregate data from all public hire bicycle systems which provide open data, currently including Santander Cycles in London, U.K., and from the U.S.A., citibike in New York City NY, Divvy in Chicago IL, Capital Bikeshare in Washington DC, Hubway in Boston MA, and Metro in Los Angeles LA.

  • Paste the full DESCRIPTION file inside a code block below:
Package: bikedata
Title: Download and Aggregate Data from Public Hire Bicycle Systems
Version: 0.0.1
Authors@R: c(
    person("Mark", "Padgham", email = "mark.padgham@email.com", role = c("aut", "cre")),
    person("Richard", "Ellison", role = "aut"),
    person(family = "SQLite Consortium", role = "ctb", 
        comment = "Authors of included SQLite code"))
Description: Download and aggregate data from all public hire bicycle systems
    which provide open data, currently including Santander Cycles in London,
    U.K., and from the U.S.A., citibike in New York City NY, Divvy in Chicago
    IL, Capital Bikeshare in Washington DC, Hubway in Boston MA, and Metro in
    Los Angeles LA.
License: GPL-3
Depends:
    R (>= 3.0)
Imports:
    dplyr,
    httr,
    lubridate,
    methods,
    Rcpp,
    RSQLite,
    reshape2,
    tibble,
    xml2
Suggests:
    knitr,
    rmarkdown,
    roxygen2,
    testthat,
    covr
LinkingTo: 
    BH,
    Rcpp
VignetteBuilder: knitr
SystemRequirements: C++11
NeedsCompilation: yes
URL: https://github.com/mpadge/bikedata
BugReports: https://github.com/mpadge/bikedata/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Remotes: jimhester/covr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/mpadge/bikedata

  • Please indicate which category or categories from our package fit policies this package falls under and why? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

Data retrieval

  • Who is the target audience?

All those interested in analysing urban mobility - urban and general transport modellers and planners, urban scientists. Anyone wanting to make cool data visualisations of movement through some of the euro-american world's biggest cities (like this prominent example).

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

No current R packages enable importing of public hire bicycle data. @ramnathv has his visualising bike share repo, but that's not an R package. I've heard rumours of a package in development for live station feeds along these lines, but even if and when that appears, it would focus on real-time data (primarily bicycle availability), whereas bikedata focusses exclusively on archiving and aggregating historical data.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Package will be archived as soon as the ropensci review process has been completed, in accordance with JOSS policies.

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@ramnathv because of his prominent prior work on analysing public hire bike systems


Note that goodpractice flags only:

  1. Test coverage, which will be extended; and
  2. Large installed size, for explanation of which see cran-comments.md

@mpadge short question even before the real editor checks, why do you use the development version of covr?

That's not part of the CRAN submission (obviously), and is used because of Jim Hester's help here - see his extended commit message at top. It's temporary and will be resolved once covr is updated on CRAN.

Nice, thanks!

Thanks for your submission @mpadge! I'm currently looking for reviewers.

Editor checks:

  • [x ] Fit: The package meets criteria for fit and overlap
  • [ x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • [ x] License: The package has a CRAN or OSI accepted license
  • [ x] Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

  • The CHECK problem was solved thanks to using the --no-multiarch option. ๐Ÿ˜Œ I've cleaned this issue a bit after that. ๐Ÿ‘ผ

  • I found a typo, "demographc" in bike_demographic_data.Rd:10


Reviewers: @eamcvey @chucheria
Due date: 2017-06-25

Thanks @maelle, the warnings are indeed my fault - sorry about that, and hopefully fixed already. I've no idea why sqlite3/sqlite3.o would not be recognised because your machine built it in the first place. First let's clarify that the warnings are gone, and hope that that fixes the .o issue...

Thanks @eamcvey @chucheria for accepting to review this package! ๐Ÿ˜บ As a reminder here are the review template and the reviewing guidelines. Your reviews are due on the 2017-06-25.

@mpadge can you confirm that/when you've changed the cleanup process? (related to what @jeroen said "The cleanup script should usually only clean files generated by configure. Files generated by make should be cleaned via a clean section in Makevars" / or use cleanup.win as mentioned by Jim Hester)

Thanks @maelle, and thanks in advance @eamcvey and @chucheria for agreeing to review - exciting to hear what you might think of it! I've done the Jim Hester cleanup.win option. Jeroen is nevertheless correct that cleanup should only relate to configure-generated files, and I will definitely transition to a single Makevars soon. The whole build is nevertheless clean from here on for all OS's.

@chucheria @eamcvey friendly reminder that your reviews are due in 10 days, on the 2017-06-25. ๐Ÿ˜‰

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing:

4h


Review Comments

It is a great package overall, it removes the necessity of have knowledge of many bike data APIs/file system of open data, which is a great time and effort saver. It will make more sense as more bike open data is available inside the package.

Vignette has example uses and is very complete.

I played with the functions a bit, it is a simple package and it allows you to use API information locally.

General code comments

I found some errors executing the code.

I downloaded NY data from 2016-01 to 2016-05 once I changed to Los Angeles city, it told me that all data already exist.

dl_bikedata (city = 'New York City USA', dates = 201601:201605)
    Downloading 201605-citibike-tripdata.zip
    Downloading JC-201601-citibike-tripdata.csv.zip
    Downloading JC-201602-citibike-tripdata.csv.zip
    Downloading JC-201603-citibike-tripdata.csv.zip
    Downloading JC-201604-citibike-tripdata.csv.zip
    Downloading JC-201605-citibike-tripdata.csv.zip

dl_bikedata(city = 'la', dates = 201601:201605)
    All data files already exist

Once I downloaded NY data, I wanted to store it with store_bikedata:

ny <- store_bikedata (city = 'nyc', bikedb = 'bikedb', dates = 201601:201605)
    Downloading data for nyc
    All data files already exist
    Adding data to sqlite3 database
    Unzipping raw data files ...
     Show Traceback

     Rerun with Debug
     Error in unzip(f, list = TRUE) :
      zip file '/var/folders/zc/yccgnjm528j4x_665t_rz0km0000gn/T//RtmpU6AJg6/201603-citibike-tripdata.zip' cannot be opened

If I do the store_bikedata from the start everything runs great.

Installation:

macOS Sierra:

Installation in macOS was smooth. A warning popped up in installation and in every test:
Warning: Installed Rcpp (0.12.11) different from Rcpp used to build dplyr (0.12.10). Please reinstall dplyr to avoid random crashes or undefined behavior. I reinstalled dplyr and the warning dissapeared.

Documentation:

  • There's no CONTRIBUTION file nor community guidelines to contribute, consider writing one to welcome people to contribute.
  • There's no CODE_OF_CONDUCT file.
  • There's no NEWS file with latest changes.
  • LICENSE is also missing, which is also important, especially for you!
  • ๐Ÿ‘ All functions are explained with examples in the documentation. Documentation and the vignette look good.
  • ๐Ÿ‘ I ran the tests and everything looks like it is in order.
  • ๐Ÿ‘ Messages and warnings look good.

Scripts

Scripts are named with a hyphen instead of snake_case, something that rOpenSci recommends and packages seem to follow that pattern. Functions and variables are well named and explained.

Please, tell me if I didn't explained well any of the comments or errors.

Many thanks for your review @chucheria ! ๐Ÿ˜ธ

@eamcvey your review was due a few weeks back, will you be able to get it done soon?

A general FYI: I had to upload a new version to CRAN because changes in the LA data were causing failures. I also addressed the first issue that @chucheria encountered so the package now gives the more informative message:

> dl_bikedata(city = 'la', dates = 201601:201605)
There are no la files for those dates

I was unable to repeat the second error, but note that tempdir() seems to have a double forward slash - .../T//RtmpU6AJg6/ instead of presumably .../T/RtmpU6AJg6/. If that were the problem, I would have no idea of either ultimate cause or solution?

thanks for the update @mpadge !

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5


Review Comments

This is a nice package that made it easy to access the data as promised. The linked article showing analysis of this type of data is helpful to make clear what can be done with it. I was able to work through the vignette successfully and look at some additional things on my own.

Some specific issues I noticed:

  • bike_db_totals and bike_demographic_data are lacking examples in the documentation
  • There are spaces before parentheses for functions used in the vignette - I'm not sure what to make of this but it disturbs my sense of R code...
  • This may be lack of understanding on my part, but vignette() could not find the package vignette locally
  • I see the authors and contact info in the DESCRIPTION, but not in the README or CONTRIBUTING files - which sounds like a requirement per the reviewing guidelines
  • When I first used this package store_bikedata worked fine, but more recently it throws Error: The dbplyr package is required to communicate with database backends. I assume this is due to other package updates, but something to take into consideration.

Thanks a lot for your review @eamcvey ! ๐Ÿ˜ธ

Thanks both @chucheria and @eamcvey for your reviews! ๐Ÿ˜ˆ I should have addressed all issues raised sometime next week, and will respond accordingly.

Package changes in response to reviews

  1. As already mentioned, the previously mis-leading message that "All data files already exist" has been replaced with "There are no [city] files for that date."
  2. Repo now includes both CONTRIBUTING.md and CONDUCT.md - thanks to both reviewers for pointing out this absence on my part.
  3. Both reviewers noted dplyr (or dbplyr) issues. These arose in response to the upgrade to dplyr_0.7, which led to interesting situations for many package developers. bikedata has resolved all such issues by dropping the dplyr dependency.
  4. Extensive example now provided for bike_db_totals() - thanks! An example was previously included in bike_demographic_data(), but just one single line that was presunably easy to overlook. The reviewer's comment nevertheless inspired me to provide a much more extensive example, during which I also relealised that there were some formatting issues for demographic data from some cities as stored in the database. These have all been rectified.

Changes not yet implemented

1. I was unable to reproduce the problem encountered by @chucheria, but have already noted that I have previously encountered strange behaviour when running store_bikedata() multiple times. I suspect that the main cause of such problems is index creation, which may only be performed once for a database, and any attempt to create indices that already exist will throw an error. Current
bikedata issue#38 should largely resolve these issues.
- now done!

Responses to other comments not related to package changes

  1. There was always NEWS.md file which I presume the first reviewer simply overlooked.
  2. The package uses GPL-3 which is a copyright license and therefore requires no LICENSE file. (LICENSE files are also rejected by CRAN for GPL-3 packages: they only apply to non-copyright licenses such as MIT).
  3. Yeah, i know file names are hyphenated rather than underscored, but this is done because there are systems which always interpret underscores as non-alpha-numeric characters and therefore produce inconsistent highlighting of function names; underscores avoid this problem. (That comment only applies to those who use vim, emacs, or other non-RStudio interfaces.)
  4. Yes, I do use function (x) rather than function(x), but now explain this in CONTRIBUTING.md: "Words of text are separated by whitespace, and so code words should be too." Let's just say I'm pre-empting physchological analyses of R-coding style akin to what has been done for R colour palettes. Spaces will one day reign ...
  5. Regarding authorship, the ropensci authorship guide merely describes the requirements for DESCRIPTION, to which this package accords, so I think current status is okay in that regard. I'll happily modify if required or desired.
  6. Finally, note that vignette() will not automatically load the vignette when the package is psuedo-loaded with devtools::load_all(). I nevertheless checked with a "proper" locall load (R CMD install <tarball>) and was then able to open vignette() as expected.

Thanks again @eamcvey and @chucheria for helpful insight into improving the package!

Awesome work @mpadge! When do you think you'll be able to solve the issue mentioned in "Changes not yet implemented"?

I agree reg. the authors list.

@eamcvey and @chucheria are you happy with the changes?

Thanks @maelle! That change is issue#38, which I will definitely resolve within the coming week. I'll let you know as soon as it's done. ๐Ÿšฒ ๐Ÿšดโ€โ™‚๏ธ ๐Ÿšดโ€โ™€๏ธ ๐Ÿšฒ

@mpadge For info you can already add the rOpenSci badge to the README of bikedata, for now it'll show "under review2" and after approval "peer-reviewed".

[![](http://badges.ropensci.org/116_status.svg)](https://github.com/ropensci/onboarding/issues/116)

@maelle done!

Thanks for your comments @mpadge. I'm sorry I overlooked the NEWS file! So embarrasing... I didn't know you don't need a license, anyway if CRAN doesn't admit it better this way, I guess I'm used to do the LICENSE and gitignore first thing.

About the error you commented you couldn't reproduce, I'll try to do the same thing as soon as I have some time in case it appears again an open an issue with more info in case it appears, does this sound good?

Thank you both @mpadge and @maelle! This has been fun!

@maelle issue#38 now done. This should resolve previous errors. Note that these are system dependent because of the different ways the bundled SQLite3 code is compiled, so it is unfortunately not possible to systematically reproduce the errors. Nevertheless, I suspect most of them arose through somehow trying to create indexes on a database that already had indexes in it. The latest modifications will robustly prevent this ever happening, and so I am pretty confident will resolve these kinds of errors. I have taken the liberty of striking through the relevant bit of my previously responses ("Changes not yet implemented")

@mpadge ok, thanks a lot! The package looks good to me but I'll wait for @chucheria to have had time to look at the error again before approving. :-)

absolutely - it's really important to ensure it really has been solved!

No error, the package seems to be solid. Thanks for the patience!

@mpadge sorry for the delay! The package is now approved, great work, and thanks @eamcvey + @chucheria for your reviews!

To-dos:

  • Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.

  • OPTIONAL You can add the reviewers in DESCRIPTION as "rev" if you want to (and if they agree). See e.g. this example. If you do this you'll need to build the package on R-devel for CRAN submission (because "rev" is now accepted in R-devel)

  • Add the (brand new!) "Peer-reviewed" badge to your README (in the future people will add this on submittal; it updates through the stages of review):

[![](https://badges.ropensci.org/116_status.svg)](https://github.com/ropensci/onboarding/issues/116)
  • Fix any links in badges for CI and coverage to point to the ropensci URL. (we'll turn on the services on our end.)
  • Activate Zenodo watching the repo
  • Tag and create a release so as to create a Zenodo version and DOI
  • Submit to JOSS using the Zenodo DOI. We will tag it for expedited review.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). Let me know if you are interested.

Thanks @maelle for all of the help and guidance getting the package through to this exciting last stage!
You list of TODOs:

  • Repo transferred
  • both @chucheria and @eamcvey added to DESCRIPTION of new release (0.0.4) with role = "rev"
  • ropensci badge added
  • all other badges re-directed (see #45)
  • Zenodo now watching repo
  • First release on zenodo here with doi:10.5281/zenodo.844466
  • Submit to JOSS - I thought that happened automatically? (It did for osmdata) Let me know if i need to do anything there

Finally, re: blog post: absolutely interested! Yes please!

You mean you didn't have to submit it? ๐Ÿค” Having @karthik as an editor of the R package surely helped. ๐Ÿ˜‰

In my experience you need to submit it, choose Karthik as JOSS editor.

Reg. the blog post let me tag @stefaniebutland who'll get in touch with you.

okay, i've submitted it now, and put Karthik as the editor:

  • Submit to JOSS

ping @richardellison, so you know where we're at here. new publication pending!

I think we can close this now (I told Karthik about the JOSS submission, I check there regularly, and the blog post discussion doesn't need this to be open). Your review badge will get green! ๐Ÿ˜

YIPPEE!!! ๐Ÿšฒ ๐Ÿšฒ ๐Ÿšฒ I didn't realise until I just discovered the ropensci issue on this that i should have but didn't actually request the explicit approval of @chucheria and @eamcvey for me to insert those rev roles in the DESCRIPTION. May I kindly request retrospective approval now? Please!

Oops and I forgot to remind it to you.

Thanks for adding me! You have my very happy approval @mpadge !

@mpadge (I'm responding later than I had hoped)
Great to hear you're interested in contributing a blog post about this package. I hope it gets more eyes on it and generates some enthusiasm and input.

For rOpenSci blog post you could do either a short post for the Developer Blog that has a technical focus, or do a post for the main blog whose audience is broader.

Main blog post would include more narrative about your motivation for creating the package, unmet need, how-to use, good to end with a thank you to package reviewers with links to their GitHub or Twitter, point readers to issues and what you think is next to improve the package and invite people to open or address an issue etc.

Deadlines:

  • Developer blog post can come any time.
  • For main blog we post about once a week. Next open slots are late September. We ask for a draft submitted as a pull request a week in advance.

Practical instructions:

Which type of post are you thinking of?

I had been thinking of inviting you to contribute a post about osmdata, so if you're interested you could do both as developer blog style, or one dev and one main.

Did I miss anything?

thanks @stefaniebutland, and no worries about delayed response. I'm absolutely keen to get these packages known in any and all ways possible. Absolutely happy for at least one to be a main blog entry, for which my preference would be bikedata, with osmdata then shucked off to a dev blog post. If that's okay with you, i'll happily get cracking on a PR for main blog post on bikedata