ropensci/software-review

MODISTools onboarding

khufkens opened this issue · 37 comments

Summary

  • What does this package do? (explain in 50 words or less):

Programmatic interface to the 'MODIS Land Products Subsets' web services (https://modis.ornl.gov/data/modis_webservice.html). Allows for easy downloads of 'MODIS' time series directly to your R workspace or your computer.

  • Paste the full DESCRIPTION file inside a code block below:
Package: MODISTools
Title: Interface to the 'MODIS Land Products Subsets' Web Services
Version: 1.0.0
Authors@R: person("Hufkens","Koen",
		  email="koen.hufkens@gmail.com",
		  role=c("aut", "cre"),
		  comment = c(ORCID = "0000-0002-5070-8109"))
Description: Programmatic interface to the 'MODIS Land Products Subsets'
    web services (<https://modis.ornl.gov/data/modis_webservice.html>).
    Allows for easy downloads of 'MODIS' time series directly to your R 
    workspace or your computer.
URL: https://github.com/khufkens/MODISTools
BugReports: https://github.com/khufkens/MODISTools/issues
Depends:
    R (>= 3.4)
Imports:
    httr,
    utils,
    tidyr,
    jsonlite
License: AGPL-3
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1
Suggests:
    knitr,
    rmarkdown,
    covr,
    testthat
VignetteBuilder: knitr

  • URL for the package (the development repository, not a stylized html page):

https://github.com/khufkens/MODISTools

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
  • Who is the target audience and what are scientific applications of this package?

MODIS data subsets have a wide range of applications tracking state changes of the environment which inform among others ecological and hydrological models. For a an in depth rational I refer to Tuck et al. 2014 (https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.1273).

Not to my knowledge. It is the only community contribution listed at the bottom of the ORNL page. Although tiled data processing packages exist, this package has a particular focus on subsets of these tiles (from a single pixel to a maximum extend of 200x200km window), mostly for rapid model development and localized environmental assessment. Tiled data download packages focus on wall to wall remote sensing image coverage, often after using subsets for model development.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

None made.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • [ x I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
    [Note: already on CRAN]
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

This package is a refactored version of the original MODISTools package by Tuck et al. 2014. In order to limit fragmentation I kept this name although the functionality changed (due in part to changes on the backend API). In order to respect the original author's contribution I would like to keep this reference in place. I also asked approval of both first and second author of the original package to do so. However, they did not want to be involved in new development due to time constraints.

Detail

Yes, but might have missed something.

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Please contact the previous package author for review:

@seantuck12

Thanks for this submission @khufkens! I will follow up shortly with next steps.

No rush, I promised both @sckott and @seantuck12 to push this back to rOpensci back in June but the summer got in the way.

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

👋 @khufkens The package looks great and my preliminary checks with good practice don't raise anything significant. Your test coverage is already high and you might consider fixing the other two suggestions as I line up reviewers.


Reviewer 1: @pmnatural
Due date: September 28th, 2018
Reviewer 2: @etiennebr
Due date: October 4th, 2018

── GP MODISTools ───────────────────── 
It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 91% of code lines are covered
    by test cases.

    R/mt_bands.R:39:NA
    R/mt_batch_subset.R:58:NA
    R/mt_batch_subset.R:66:NA
    R/mt_batch_subset.R:130:NA
    R/mt_dates.R:37:NA
    ... and 15 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used
    it and it is easier to read your code for them if you use '<-'.

    R/mt_batch_subset.R:72:10
    R/mt_dates.R:58:8
    R/mt_read.R:33:10
    R/mt_read.R:34:10
    R/mt_read.R:35:10
    ... and 28 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are
    error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along()
    instead.

    R/mt_subset.R:167:23

──────────────────────────────────────────────────────────

I have reached out to 3 potential reviewers on Sep 4th. Will update thread when I hear back.

I fixed most of the notes above, but haven't pushed them. Keep me posted on progress.

Thanks for agreeing to review @pmnatural 🙏 Please reach out if you have any questions.

Tagged @etiennebr as the second reviewer.

Thanks @karthik and @khufkens. I'm looking forward to review the package! @khufkens are you planning to push your updates before we review the package?

I don't think I'll update the CRAN package before review. I'm not sure how this influences the process. I've mainly been focussing on addressing the issues above and interoperability and docs. I still have to go through this: https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html to fix some issues (e.g. create a ornl_api function etc).

Ok, I won't worry about the CRAN version; are the fix pushed to the github master branch?

Yes, everything is in the master branch. Larger additions or refactoring is done on a separate one (but not applicable now).

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
    The statement of need includes all the information to understand what the package does. However, I think it could be reordered to express needs from generic to specific and some details could be removed. Since dowloading MODIS data is the primary purpose, I would suggest it could make a better description than an interface to a web API (this also applies to the DESCRIPTION file). It also assumes that the user knows what MODIS is and why it should be used; I think a description of the purpose of MODIS would broaden the potential user base.
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
    The vignette is pretty torough and presents most of the functions in the package. In fact, the only missing function is mt_sites and at this point I think it would be interesting to include it in the vignette to facilite onboarding of users; Especially those new to MODIS.
    However, I must admit I struggled to understand what exactly was returned by mt_subset and how to use it. I would suggest to expand the vignette with an example of an application (e.g. plot a time serie of values) and a description of the returned object (completed in the mt_subset documentation).
  • Function Documentation: for all exported functions in R help
    All functions are documented. I think some would benefit from cross-linking, which helps new users navigate the package in an intuitive manner. For instance the mt_read has a MODISTools data file as an argument, and it would help to link to mt_write
    As mentionned above, mt_subset would benefit from an expanded @return section.
  • Examples for all exported functions in R Help that run successfully locally
    mt_read requires to run mt_write before. Since mt_read is included in mt_write example, you could merge both help files with the @rdname mt_write tag.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

skip

Functionality

  • Installation: Installation succeeds as documented.

  • Functionality: Any functional claims of the software been confirmed.

  • Performance: Any performance claims of the software been confirmed.
    No specific claims

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
    All functions are tested. However, I believe MODISTools would benefit from a higher test coverage. I also think that renaming tests/run_tests.r to tests/testthat.R would be closer to testthat standard structure.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
    **

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10h (including reading all the material available to new reviewers)


Review Comments

Thanks @khufkens for your contribution. I believe the package can be useful to users looking for MODIS time series in specific locations and really simplifies the usage of the web API. I was surprised at first that it did not provide a raster object after download, but I understand how it can simplify the work of downloading data for multiple locations and dates. I find the code generally straightforward and easy to read, the comments are often helpful and clear. In my opinion the two most important points to improve for this package are the documentation of the MODISTools class and the unit tests not only to reach a higher proportion of coverage, but also test the functions more extensively to make the package easier to maintain.

I have a some miscellaneous comments and improvements to suggest, I think they could make the package even better by improving the users experience and facilitate maintenance.

  • Making the frequency a duration object might make it easier to use
    in code or at least make it easy to parse (remove dashes and specify 1 year
    rather than yearly). Same thing for the calendar_date of mt_dates() and the proc_date from mt_subset().
products <- mt_products()

products %>% 
  mutate(frequency = gsub("-", " ", frequency)) %>%
  mutate(frequency = gsub("Day", "day", frequency)) %>% 
  mutate(frequency = gsub("Daily", "1 day", frequency)) %>% 
  mutate(frequency = gsub("Yearly", "1 year", frequency)) %>% 
  mutate(freq = duration(frequency)) %>% 
  select(freq, frequency)

#>                      freq frequency
#> 1        86400s (~1 days)     1 day
#> 2    31557600s (~1 years)    1 year
#> 3   691200s (~1.14 weeks)     8 day
#> 4       345600s (~4 days)     4 day
#> 5   691200s (~1.14 weeks)     8 day
#> 6   691200s (~1.14 weeks)     8 day
#> 7  1382400s (~2.29 weeks)    16 day
#> 8   691200s (~1.14 weeks)     8 day
#> 9   691200s (~1.14 weeks)     8 day
#> 10  691200s (~1.14 weeks)     8 day
  • I'm biased here, but support for sf objects would be a great addition and augment compatibility with other workflows (every time a lon-lat (or x, y) is required or returned, use an sf object).

  • I would suggest that mt_batch_subset should convert (x)yllcorner to at least lon and lat (or even better an sf polygon :) ), rather than the current offsetted lon and lat. It would make it easier for the user to map metadata or, for instance, load a list of sites from a shapefile and find MODIS data with an st_intersection.

  • Some typos in the vignette. e.g. "Below an examples are"

  • I think that caching (using e.g. memoise::memoise) could speed up some functions and also be lighter on the API provider.

  • I wonder if the tidy list shouldn't rather be a tidy data frame (or tibble). It could either be a nested data frame, but I think it might be hard for users to manipulate these data frames. So maybe a data frame repeating the metadata would be a good option? It repeats some values, but I think it might offer the right balance of convenience. It might also remove the need for a MODISTools class, which I believe would increase compatibility and maintainability.

  • For some functions, code coverage is relatively low and I think it would be a great improvement (I personally find that API packages are harder to test, anyway). And I'm sure it would help discover some bugs. For instance, I found one just by looking at the coverage results: using !length(site_id %in% sites$siteid) is always FALSE unless no site_id is specified (it doesn't really check if the site_id is in the list).

  • I would get rid of the try() methods in the tests. In my opinion tests should fail on error with an explicit message and present an explicit expectation. For instance, in test_ancillary_function.r rather than chain a series of conditions such as !inherits(subset,"try-error"), you could use expect_is(subset, "MODISTools"). There is also an expect_identical that could make the comparisons explicit and produce accurate error message that will improve maintainability and contributions.

  • I have not tested that, but I anticipate that the error messages following the usage of a try will generally be unhelpful. For instance, if a user has issues with an SSL certificate it would receive Your requested timed out or the server is unreachable. The httr error might be more helpful and in that case it might be better not to wrap the function in a try and simply let the error surface to the user. There are also ways to pass the error message in a custom way, for instance by using tryCatch or geterrmessage with try. I changed the way I handle errors after reading the tidyverse guide to error message.

  • It's the first time I observe a systematic use of default NULL parameters in the function header. I think removing the default NULL for mandatory parameters might be more robust. You could rather rely on missing() and R to warn the user of missing variables might be lighter on maintenance.

  • It wasn't obvious to me what the core mt_batch_subset function did and what exactly it returned. For instance, what does internal = FALSE did. I now had to provide a data.frame to specify the locations (it would be good to know the expected column names lat and lon). Maybe it would be convenient to extract the preparation of this data.frame from the mt_batch_subset function to help the user decide exactly what to download? Also, I think that sf support might offer a consistent interface to provide locations to functions; it might however be overkill for many usages.

  • I find it hard to specify a series a specific (non-consecutive) dates (for instance when I had other imagery available) because it always required to use ranges of consecutive dates.

  • At the time of writing, there are still some good practices that seem apply to the package (goodpractice::gp()).

Hi @etiennebr, thank you for reviewing the package and the contributions to improve the package.

I've been chipping away on implementing some of your comments but I was wondering if I could get some more information on some of your remarks.

In particular, I was wondering how to restructure the data. You mentioned that, indeed, I can reformat things into a tidy data frame (with a duplication cost for some data, but reducing overall complexity of the data structure - a fair trade-off given that space is probably cheaper than cpu cycles).

However, this might be at odds with your request for a true bounding box using the ll-tr offsets as an sf polygon. The only other option in this context would be to formulate a function which translates the location data in the tidy data frame into a an sf polygon.

Does this sound like a good middle ground - balancing simplicity and compatibility?

You've probably been thinking about this problem longer than me, so you might have a better solution or ran into issues I haven't seen with my proposal. Here's one way to do it.

(I'm starting from the vignette)

mt_tidy <- function(x) {
  as_tibble(x$header) %>%
    mutate(data = list(x$data))
}
mt_tidy(subset)
#> # A tibble: 1 x 16
#>   xllcorner yllcorner cellsize nrows ncols band  units scale latitude longitude site  product
#>   <chr>     <chr>     <chr>    <int> <int> <chr> <chr> <chr>    <dbl>     <dbl> <chr> <chr>  
#> 1 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> # ... with 4 more variables: start <chr>, end <chr>, complete <lgl>, data <list>

mt_tidy(subset) %>% unnest()
#> # A tibble: 20 x 22
#>    xllcorner yllcorner cellsize nrows ncols band  units scale latitude longitude site  product
#>    <chr>     <chr>     <chr>    <int> <int> <chr> <chr> <chr>    <dbl>     <dbl> <chr> <chr>  
#>  1 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  2 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  3 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  4 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  5 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  6 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  7 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  8 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#>  9 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 10 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 11 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 12 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 13 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 14 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 15 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 16 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 17 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 18 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 19 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> 20 -9370036… 4447802.… 926.625…     1     1 LST_… Kelv… 0.02        40      -110 test… MOD11A2
#> # ... with 10 more variables: start <chr>, end <chr>, complete <lgl>, modis_date <chr>,
#   calendar_date <chr>, band1 <chr>, tile <chr>, proc_date <chr>, pixel <chr>, data <int>

It doesn't prevent an sfc column to be created (and repeated on unnest). Although, I'm not sure if all the information is available (It would me much more easier if the pixel size was in degree). You can even add a second geometry column for the requested coordinates.

I hope this helps, but don't hesitate if that's not the case. As I said above, maybe I'm overlooking some limitations.

Thanks @etiennebr for the code contribution!

Sorry for my slow response, I've a grant deadline coming up and at night I have a connection which is basically enough for an email but not much more.

From reading through your post (and code) I think this is a good solution. It makes things easier for sure, and indeed provides the sf mapping compatibility. I'm on the slow connection, so I'll try to give things a go tomorrow.

Hi @etiennebr,

Below the corrections I made to the package.

Major changes include:

  • the data format, currently a data frame (not nested anymore)
  • due to the format removed mt_read() and mt_write()
  • unit testing is complete (aside from server errors), coverage at 96%
  • compliant with most style norms = replaced by <- where necessary
  • use of missing() instead of NULL
  • goodpractice::gp() ok, aside from server side issues
  • use of memoise() to limit server queries
  • updated documentation and internal links

Other comments are below, let me know how you feel about this implementation.


Making the frequency a duration object might make it easier to use in code or at least make it easy to parse (remove dashes and specify 1 year rather than yearly). Same thing for the calendar_date of mt_dates() and the proc_date from mt_subset().

I've implemented the conversion of the standard output to a more readable format. I've not included the conversion to seconds as this meta-data is rarely used in analysis and implicit when downloading the time series (which is always continuous, no individual dates can be queried).

I'm biased here, but support for sf objects would be a great addition and augment compatibility with other workflows (every time a lon-lat (or x, y) is required or returned, use an sf object).

This is indeed a valuable addition, if not for visualization purposes alone. I included adapted code you supplied (thanks again for this). I do use a base R implementation in order to limit direct dependencies.

I would suggest that mt_batch_subset should convert (x)yllcorner to at least lon and lat (or even better an sf polygon :) ), rather than the current offsetted lon and lat. It would make it easier for the user to map metadata or, for instance, load a list of sites from a shapefile and find MODIS data with an st_intersection.

See the above. This is implemented through two functions. One to convert the xll / yll corners to lat / lon, subsequently there is function to convert this coordinate, together with other meta-data to an sf polygon. This data is not tagged onto the original data frame to limit complexity and ease of use (saving data as a flat file to disk).

Some typos in the vignette. e.g. "Below an examples are"

Fixed the typo. Thanks for spotting this.

I think that caching (using e.g. memoise::memoise) could speed up some functions and also be lighter on the API provider.

I implemented memoise() on the meta-data queries (those remain largely static between or within subset calls. A significant decrease in load time should be seen (as well as decreased overhead on the server).

I wonder if the tidy list shouldn't rather be a tidy data frame (or tibble). It could either be a nested data frame, but I think it might be hard for users to manipulate these data frames. So maybe a data frame repeating the metadata would be a good option? It repeats some values, but I think it might offer the right balance of convenience. It might also remove the need for a MODISTools class, which I believe would increase compatibility and maintainability.

In the end I decided to do away with the nested structure. The output is now a plain data frame which can easily be save as a flat file to disk.

Consequently the mt_read() and mt_write() functions are removed as a simple read.table() or write.table() will work. I also removed the MODISTools class from the retrieved data after an mt_subset() query.

For some functions, code coverage is relatively low and I think it would be a great improvement (I personally find that API packages are harder to test, anyway). And I'm sure it would help discover some bugs. For instance, I found one just by looking at the coverage results: using !length(site_id %in% sites$siteid) is always FALSE unless no site_id is specified (it doesn't really check if the site_id is in the list).

I fixed the length() issue and tried as much to cover all code. Sadly, server time-outs can not be tested, so this remains.

I would get rid of the try() methods in the tests. In my opinion tests should fail on error with an explicit message and present an explicit expectation. For instance, in test_ancillary_function.r rather than chain a series of conditions such as !inherits(subset,"try-error"), you could use expect_is(subset, "MODISTools"). There is also an expect_identical that could make the comparisons explicit and produce accurate error message that will improve maintainability and contributions.

I updated the testing routines, removing the try() statements.

I have not tested that, but I anticipate that the error messages following the usage of a try will generally be unhelpful. For instance, if a user has issues with an SSL certificate it would receive Your requested timed out or the server is unreachable. The httr error might be more helpful and in that case it might be better not to wrap the function in a try and simply let the error surface to the user. There are also ways to pass the error message in a custom way, for instance by using tryCatch or geterrmessage with try. I changed the way I handle errors after reading the tidyverse guide to error message.

I let the errors surface, without custom error messages. I find think that depending on the circumstances these messages can be good for advanced users, but often confuse beginners.

It's the first time I observe a systematic use of default NULL parameters in the function header. I think removing the default NULL for mandatory parameters might be more robust. You could rather rely on missing() and R to warn the user of missing variables might be lighter on maintenance.

Used missing() throughout, converting other packages slowly as well. Thanks for pointing me toward this function.

It wasn't obvious to me what the core mt_batch_subset function did and what exactly it returned. For instance, what does internal = FALSE did. I now had to provide a data.frame to specify the locations (it would be good to know the expected column names lat and lon). Maybe it would be convenient to extract the preparation of this data.frame from the mt_batch_subset function to help the user decide exactly what to download? Also, I think that sf support might offer a consistent interface to provide locations to functions; it might however be overkill for many usages.

This is mostly a documentation issue I think. The input can either be a data frame, or a csv file using a particular structure (which indeed is not mentioned). I'll hold of on sf integration, as conversion of any sf object (points) to a basic ID, lat, lon dataframe should be easy.

I find it hard to specify a series a specific (non-consecutive) dates (for instance when I had other imagery available) because it always required to use ranges of consecutive dates.

This is a limitation of the API on the ORNL side not the package itself. The use case is also rare. Given the relatively small datasets the overhead is also minor to extract certain dates afterward.

At the time of writing, there are still some good practices that seem apply to the package (goodpractice::gp()).

This is covered.

I'm sorry for not getting back earlier on this. @karthik what are your expectations for the next steps? I have several engagements in the next few weeks, and I would like to devote enough time to appropriately review @khufkens's work. I think I can do it before the end of the year, if that sounds right for both of you.

@etiennebr Thanks for the thoughtful response. I think the next steps would be to see if @khufkens has adequately addressed the issues that you've raised. Otherwise you can point out the ones that remain unresolved. If it's easier, you can also open issues on his repo and tag ropensci/onboarding#246 and they will appear as a list here (with automatic updates on when those issues are closed).

@pmnatural Have you had a chance to complete your review yet? Please update on your progress.

@etiennebr I've fixed both the cell size issue and the error statements (both in mt_dates and mt_subset). I also included the reference to the sinusoidal projection both on the LP DAAC and wikipedia. Both explain the projection and the mention the required parameters. Build is still clean, all should be good.

Thanks for all the feedback!

Good job @khufkens! @karthik I would recommend approving this package; @khufkens adressed the issues I raised.

I think there is still room for a higher test coverage rate to help the package in the long run, and sf support could be even easier for the user, but I see these elements as goals to keep in mind while evolving the package. I recommend to approve the package given @khufkens's revisions following the review and that the package is useful and usable in the current form.

Thanks for the vote of confidence @etiennebr. I acknowledge that sf support would be nice, but I also want to make sure that there is something for people to work with. Since the CRAN (re)release the package has had some good traffic. So the demand for the tool is there.

New features can be added incrementally without breaking things. I think with the feedback the package is also better structured which makes all the difference when moving forward.

Thank you for your review and recommendation @etiennebr!

I'll need a few more days to look everything over but I will make a decision early next week and advise next steps for @khufkens

Thanks. Take your time, no rush. Previous version is still up and things only got better.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:
  
- [x] **A statement of need** clearly stating problems the software is designed to solve and its target audience in README
- [x] **Installation instructions:** for the development version of package and any non-standard dependencies in README
- [x] **Vignette(s)** demonstrating major functionality that runs successfully locally
- [x] **Function Documentation:** for all exported functions in R help
- [x] **Examples** for all exported functions in R Help that run successfully locally
- [x] **Community guidelines** including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with `URL`, `BugReports` and `Maintainer` (which may be autogenerated via `Authors@R`).

Functionality

- [x] **Installation:** Installation succeeds as documented.
- [x] **Functionality:** Any functional claims of the software been confirmed.
- [x] **Performance:** Any performance claims of the software been confirmed.
- [x] **Automated tests:** Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] **Packaging guidelines**: The package conforms to the rOpenSci packaging guidelines

#### Final approval (post-review)

- [ ] **The author has responded to my review and made changes to my satisfaction. I recommend approving this package.**
  
  Estimated hours spent reviewing: 10 (but 8 months apart!)
  
  ---

Review Comments

  • I think this package has to stress, both in the need statement and in the vignettes, its use for extracting point based values from 'MODIS Land Products Subsets' web services of ORNL using its API. It does a great job for someone not familiar with R GIS functionality and greatly eases the job for getting point time series of Modis Land Products.
  • Clearly state that point value extraction is for point coordinates given as latitude, longitude or rectangular buffers defined by a width and height in integer kilometers, as testing does not allow for not integer distances.
  • To increase usability the help and vignettes should start telling the user to first check which product is wanted and then the name of the required band, as bands do no follow the ordering of the regular HDF products (or the user simply does not know them).
  • The tidy data.frame brings too many metadata, which is really nearly all the same for every point, even the LL coordinates. Could this be output in a different object, without the need to define a class?
  • The batch download could be greatly improved with a function to import coordinates locations from a CSV or even sf objects, or a least include examples of how to do it in the help.
  • There is an example for a time series graph of a point location, it would be very useful to have other examples for comparing data from different times or batch downloads of rectangular areas representing different cover types, eg. boxplots.

Hi @pmnatural,

Below my response and references to the code changes.

I think this package has to stress, both in the need statement and in the vignettes, its use for extracting point based values from 'MODIS Land Products Subsets' web services of ORNL using its API. It does a great job for someone not familiar with R GIS functionality and greatly eases the job for getting point time series of Modis Land Products.

I'm not sure which exact version you reviewed but both the vignette and DESCRIPTION file now reference to the ORNL API. Although the focus is on point based extractions the querying of regional data prevents me to explicitly state that it is a "point based" approach (the upper limit is 200 x 200 km a sizable footprint, roughly covering Belgium or the state of Massachussets).

Statement can be read here:

Clearly state that point value extraction is for point coordinates given as latitude, longitude or rectangular buffers defined by a width and height in integer kilometers, as testing does not allow for not integer distances.

https://khufkens.github.io/MODISTools/reference/mt_subset.html

By default kilometers are now rounded to prevent errors on this case.

To increase usability the help and vignettes should start telling the user to first check which product is wanted and then the name of the required band, as bands do no follow the ordering of the regular HDF products (or the user simply does not know them).

This has currently been implemented, highlighting the steps needed to gather all required information.

The tidy data.frame brings too many metadata, which is really nearly all the same for every point, even the LL coordinates. Could this be output in a different object, without the need to define a class?

On the suggestion of a previous reviewer a switched from a nested listed to a tidy dataframe. Given current limited data size restrictions and general small datasets I don't see this as an issue. Tidy data allows for quick plotting data across bands, locations, pixels, without looping over complex list structures. It also facilitates concatenating multiple files / bands without the need for nested lists. I would argue that convenience trumps data duplication in this case (as often with long rather than wide or nested formats). Some of the power of this tidy data is provided in the vignette and the ease with which to generate spatial or other subsets or aggregated analysis.

More recent features also include functionality to convert the tidy dataframe to raster stacks. I refer to the vignette to describe this feature and the teaching material by colleague Prof. dr. Jan Verbesselt in his time series analysis course. A shorter version is provided in my github repo. Both are too detailed to serve as general examples.

The batch download could be greatly improved with a function to import coordinates locations from a CSV or even sf objects, or a least include examples of how to do it in the help.

Support for both data frames and CSV files is already in place, as mentioned in the documentation. An example is also included in the documentation and tested for in unit checks.

There is an example for a time series graph of a point location, it would be very useful to have other examples for comparing data from different times or batch downloads of rectangular areas representing different cover types, eg. boxplots.

A new analysis is provided in the vignette using a spatial analysis around the Arcachon Bay in south-west France (random example of a rather varied landscape on a small area).

I hope this addresses all the issues raised!

Kind regards,
Koen

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Congrats @khufkens , your submission has been approved! 🎉 Thank you for submitting and @etiennebr and @pmnatural for thorough and timely reviews. To-dos:

  • Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.

  • Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

  • Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

Please also add this badge to your README

rOpenSci Peer Review

[![rOpenSci Peer Review](https://badges.ropensci.org/246_status.svg)](https://github.com/ropensci/software-review/issues/246)

Congratulations @khufkens 🎉. We would indeed love to have a post about MODISTools.

Guidelines for submitting a blog post or tech note are here: https://github.com/ropensci/roweb2#contributing-a-blog-post. We currently have publication slots available June 11 or later.

Happy to answer any questions

Good job @khufkens! 🎉

Thanks all for the feedback! I can do a short blog post, but for something more substantial I'll need some time. If time is no objection I can do this after June 4th.

I also get this error upon transfer of the repo, not sure how to proceed.
Screen Shot 2019-05-20 at 20 24 30

I can do a short blog post, but for something more substantial I'll need some time.

I can be 100% flexible on publication date, since most important is for you to get what you want out of it. Short, or substantial is your choice based on the message you want to convey. Do you want to propose a date to submit a draft?

Hi @stefaniebutland, can we peg it on mid June then? This should give me some time to frame things properly with some graphs etc.

Should be fixed now @khufkens

Thanks!

Hi @karthik I was notified by someone that the website link of the package is broken. I forgot to change the link to an ropensci account one.

It currently reads (which leads nowhere):
https://khufkens.github.io/MODISTools/

but should be
https://ropensci.github.io/MODISTools/

Is there someone at your end who can update this so that people can access examples etc?