pyOpenSci/software-submission

Submit pydov for review

stijnvanhoey opened this issue · 37 comments

Submitting Author: @Roel, @stijnvanhoey, @pjhaest
Package Name: pydov
One-Line Description of Package: Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)
Repository Link: https://github.com/DOV-Vlaanderen/pydov
Version submitted: 1.0.0
Editor: @lwasser
Reviewer 1: @NickleDave
Reviewer 2: @xmnlab
Archive: TBD
Version accepted: TBD


Description

pydov is a Python package to query and download data from Databank Ondergrond Vlaanderen (DOV). DOV aggregates data about soil, subsoil and groundwater of Flanders (Belgium) and makes them publicly available. Interactive and human-readable extraction and querying of the data is provided by a web application, whereas the focus of this package is to support machine-based extraction and conversion of the data.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

pydov enables the access to the data services (both WFS services as well as the data itself) provided by DOV. It supports users in searching and downloading data on soil, subsoil and groundwater in Flanders in a reproducible way.

  • Who is the target audience and what are scientific applications of this package?

The machine-based availability of the data can potentially serve a diverse community of researchers and students. Applications are in the field of geology, hydrogeology and geotechnics.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

We have no knowledge about other packages that provide access to DOV data.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses. -> set of tutorials
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

Hi @stijnvanhoey !!! 👋 thank you for this submission!! i will add this to our list of packages to discuss in our next meeting which is next week - feb 6!!

@NickleDave will be a reviewer for this submission!! @lwasser will try to find a second reviewer and editor (or she will be editor)

I would be happy to help here as a second reviewer :)

hey @NickleDave and @xmnlab i know there is a lot going on. i just wanted to check in on this review to see how you guys were doing. I am very behind and know that ropensci is stopping new submissions for atleast a month. are you guys still able to review this package or should we take a step back?

Many thanks.

I can work on that. I will start that this week.

Same here, I think I can finish by next weekend at the latest.

Thank you so much @xmnlab and @NickleDave !! much appreciated.

ok let's shoot for both reviews in by Wed April 15th just in case you guys need a bit more time!! and reach out please if you need anything.

Hi @Roel @stijnvanhoey @pjhaest thank you for your patience. Please find below my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

At a high-level pydov is a fully-developed packages that meets the PyOpenSci review criteria. It provides impressive functionality, yet is tightly scoped to querying and downloading data from Databank Ondergrond Vlaanderen (DOV).

I have one request for the documentation, and one question about the code, although I don't think the question about the code needs to be addressed before approving the package.

The documentation is very thorough, but I think one thing that would help new users is an "intro to working with pydov" tutorial. This tutorial would explains things at a level between the quickstart and the very detailed tutorials for each individual search object. What this tutorial would provide, that the current tutorials do not have, is an introduction to the core abstractions provided by the package. I notice that each DOV search tutorial follows the same structure, but there is not a lot of language about why a potential user might want to use some of the functionality, e.g. why get all the groundwater screens / boring holes within a bounding box.

The intro tutorial would ideally be a specific example of an analysis--it could even be one of the existing search tutorials--that introduces the concepts that repeat throughout the individual tutorials for each search object. This intro should be targeted at a "naive" researcher who is interested in having access to the data, but might not be familiar with some of the existing data structures and libraries that pydov leverages, e.g. the WFS protocol. I admit to a total lack of domain expertise in this area, so the devs' idea of what a "naive" researcher would know compared to mine might vary, but I think that adding an intro with more language about about what and why the package does what it does, in addition to how to do those things, would make it more welcoming to new users.

My question about the code relates to the search sub-package. I understand the need for the types sub-package, because each class in that sub-package encapsulates the metadata such as wfs and xml fields specific to each dataset type. However it's not quite clear to me why different classes are needed for the search objects that correspond to each type in the search package. Am I right that all the search classes have the exact same attributes and methods? My read of the code is that the only difference between the search classes is the default value for objecttype. E.g., a BoringSearch class has its objecttype default =Boring , which dictates what fields it has, but aside from that, it has the exact same methods as all the other search types. If I'm misunderstanding something about the code, please let me know and I'll just remove this question. But it seems like it might be possible to reduce code repetition by having another class that inherits from AbstractSearch with all the code that is common to the specific Search classes, and then just subclass this additional class for specific types.
Something like:

class MetaSearch(AbstractSearch):
    def __init__(self, layer, objecttype):
        super(MetaSearch, self).__init__('dov-pub:Boringen', objecttype)

    def __common_search_methods_here__():
    ...

import Boring

BoringSearch(MetaSearch)
    def __init__(self):
        super(BoringSearch, self).__init__(layer='dov-pub:Boringen', objecttype=Boring)

Thank you again for your patience during this crazy time and thank you for submitting the package, it was a pleasure to review. Looking forward to your response

hey everyone, I couldn't finish my review today. I will finish that tomorrow. sorry for the delay.

@Roel, @stijnvanhoey, @pjhaest, thanks for your patience. pydov looks awesome, I am adding my review notes below:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4 hours


Review Comments

Congratulation for the work you all are doing on pydov. I am adding general notes below:

  • The documentation looks very detailed and instructive about how to use the library. I agree with David, maybe would be good to have a tutorial with an introduction something like a blog post (also it could be posted on PyOpenSci blog section).
  • API reference documentation looks very well detailed.
  • Good test coverage (96%).

Requests

  • There is no instructions about how to install the package for development in README file (e.g. prepare the environment: "pip install -r requirements_dev.txt", install the package in the development mode "python setup.py develop", etc)
  • The quick start located at README maybe could be improved with the output of dataframe, so it will give more details about the result of a boring search. It also applies to https://pydov.readthedocs.io/en/stable/quickstart.html

Suggestions and general comments (not required)

  • Maybe it would be good to have a link to binder on each tutorial page to its corresponding tutorial on binder
  • In the tests, you can change assert type(df) is DataFrame to assert isinstance(df, DataFrame)
  • It is not too common to import objects from a test_*.py maybe it could be moved to conftest (if it applies)
  • It seems the test classes are inheriting from a base class (AbstractTestTypes, AbstractTestSearch) and some methods' docstring are the same as the inherited method, using this approach is very hard to maintain and keep everything consistent. Also some methods could be an attribute, e.g.: def get_type(self): return Boring -> klass = Boring and it keeps your test classes more compact.
  • The pydov classes are also inheriting from base classes, and the docstrings work looks duplicated, if you don't add any docstring, the method will inherit from the base clases. If you think you need to have a customized docstring, maybe you can check some approaches for dynamic docstrings
  • typo: https://pydov.readthedocs.io/en/stable/notebooks/search_grondwaterfilters.html#Plotting "... Pandas cen be used"
  • It would be good to have in each chart the title and label for each axis
  • Maybe pydocstyle can help to check static docstrings (you also can ignore erros you think any one doesn't apply for pydov). If you want to check dynamic docstrings you can use numpydoc --validate (https://numpydoc.readthedocs.io/en/latest/validation.html#validating-numpydoc-docstrings) you can check one example here https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py
Roel commented

Hi @NickleDave @xmnlab

Thank you for taking the time to review our package so thouroughly during these strange times.

All the points you raised are valid and I have made issues for them in our Github project so we can track progress there as we work ourselves through them. We plan to have a two-day codesprint somewhere in May, and I'm confident we can tackle most of the issues then.

@NickleDave: I agree that an introductory tutorial that introduces the search concepts more in detail (and by example) would help new users. As you mention the concepts are the same for each of the search objects, so if a new user can understand these concepts from the introductory tutorial they can quickly move on to the subject of their interest. We will definitely work on this in the codesprint.

Regarding your question about the search classes, you are right that each of them duplicates quite a lot of code. The idea is to store the (results of the remote calls to the) WFS schema, metadata, XSD schemata and so on in class variables of the search classes so that they are shared between search class instances. This avoids dowloading this data multiple times in the same session. There might be a better way to achieve this though. Another option we can consider is letting go this behaviour of keeping this data in class variables in the first place. Since the search instances don't persist data between searches there should be no need to create more than one. I made an issue for this in our Github, we can discuss this further either there or here, as you please.

@xmnlab: We will add installation instructions for a development installation in the README as well as in the installation section in the documentation. There are bits and pieces of information (f.ex. in the contributing section), but that is quite hard to find and assemble :)

We will also add the output dataframe of the search, both in the README as well as in the Quick start in the documentation. I agree that (potential) users can so quickly see when they can get as output.

I have made issues for your other suggestions. Thanks very much for tips, we will definitely look into them!

Thanks a lot to both of you for your time and reviews! We will work through your suggestions as time allows during the next couple of weeks and will follow up here once we made some progtess :)

Stay safe!

hi @Roel @xmnlab @NickleDave thank you all for your work on this review! @Roel if and when you have time, can you kindly reference this issue in your issues / pr's as you work through them? this will make it easier to circle back on the review in may or june (or whatever timing works for you) to ensure things have been resolved!! Many thanks all.

👋 all!! just checking in here... it looks like there are a bunch of issues and such being worked through still so that is totally cool. if anyone has questions please say the word... otherwise i'll just wait to see things closed up and then will check in again to see whether the reviewers are satisfied!! i'll be offline next week so i won't be responsive until june 1! Stay healthy everyone :)

Roel commented

Hi @lwasser ! Our code sprint is planned for next week (May 28-29), so I hope to make some progress in the issues then. I assume we should be able to pick up the review somewhere in June :)

Thanks for your patience, enjoy your week offline and stay healthy!

@Roel sounds great! just ping us here when you are ready to pick back up. i know this all takes a lot of time so take the time that you need!

Roel commented

Hi all! I hope everyone's doing well out there. We managed to tackle most of the raised points during our last code sprint and in the weeks that followed. There are still two issues unresolved for now, but we'll take these on in one of our next sprints for sure.

Let me know whether it suits you to pick back up here. Thanks for you patience and take care!

hi all. just checkin in to see whether all review comments have been addressed or who is working on what items at this point. many thanks.

oh sorry, I think I lost this notification, really sorry. I will take a look at the last comment @Roel sent. it sounds pretty nice that a lot of issues were addressed at the sprint! that is great :)

@Roel thank you so much for your patience! thanks for addressing the points I suggested.

@lwasser, The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

thank you @xmnlab !!! @NickleDave you were the second reviewer on this package. have all changes been implemented per your satisfaction? it sounds like there are two outstanding @Roel can you tell us what is left to do following this review from your perspective? and perhaps at this point some have been addressed in a sprint?

Thank you for pinging me @lwasser -- sorry for losing track of this

Yes I see that DOV-Vlaanderen/pydov#254 adds an introductory tutorial -- looks good @Roel, I think this can definitely help a newcomer quickly get up to speed on usage of pydov.

The other concern I raised about code duplication has been noted in DOV-Vlaanderen/pydov#264 but I don't think this should be a blocking issue.

So, yes, the authors have responded and made changes to my satisfaction.
I recommend approving pydov (🎉)

Roel commented

Awesome, this is great news! No worries about the timing, we're all busy :)

We will definetely pick up the remaining issues in one of our next sprints and keep them in mind when refactoring or working on new features.

Thank you again for your time @lwasser, @xmnlab and @NickleDave! Great to become a part of the growing list of pyOpenSci packages!

Thank you for submitting pydov for review @Roel @pjhaest @stijnvanhoey !!
I think it's a great tool, excited to tell more people about it.
Perfect for pyOpenSci. And I learned a lot from seeing you all develop it further.

🎆 excellent! ok i think then we can consider pydov accepted as a part of pyopensci!!
thank you @Roel @pjhaest @stijnvanhoey and @NickleDave @xmnlab for the efficient review process here!!

there are a few last steps if someone is willing to take the time to do this

Reviewers:

If you are not already on our website as a contributor, please submit a PR to pyopensci.github.io to add yourself as a contributor using:

contributor_type:
    - reviewer

Package Maintainer(s)

@stijnvanhoey for you the tasks are:

  • Add the badge for pyOpenSci peer review, with the second link being the URL to this issue:
    pyOpenSci
  • Submit a PR to pyopensci.github.io to add pydov to the pyOpenSci package list and to add yourself as a contributor using:
contributor_type:
    - package-maintainer

and

packages-submitted: ["pydov"]

OPTIONAL!!

If you are interested (not required), you can write a blog post for the pyOpenSci website about PyDov (see blog post about pandera) to promote your package. This is totally up to you!

Is the version that we are accepting here 2.0.0 ?
Thank you all!!

This is indeed great news, thanks reviewers for all the comment and all credit to @Roel for being the main driver to get these changes and improvement all tackled. Thanks a lot!

With respect to the badge, see DOV-Vlaanderen/pydov#292

Hi there @stijnvanhoey ! i'm doing some housekeeping today and i noticed pydov is not on our list of packages here. Nor are you listed as a contributor! did a PR get missed? i don't see one but that does not mean i didn't miss something. we can close this once this step is complete! i hope you are well!

Hi @lwasser, thanks for the update! I don't know what we missed. But we are having a next code sprint in some weeks. If it's not picked up before that time, we'll definitely do it at that occasion.
Thx all!

Closing this--thank you again all for a great review process!

hi there @stijnvanhoey !! i wanted to check in on pydov here! pyOpenSci has funding to operate for a few years now and i'm focused on the project.

As a first step, i'm reaching out to all maintainers and reviewers who have worked with us in a review to request that they complete a survey:

🔗 Link here
this should take you about 10 to 15 minutes and i'd greatly appreciate your time in doing it.

Also are you the only maintainer of this package? If there are other maintainers can you please list them here and kindly ask them to take the survey as well?

We also have a slack community that you are welcome to join if you'd like. i can email you with a link.

Many thanks in advance for doing this! And i hope all is well!

actually i think there are three maintainers possible here?
@Roel , @pjhaest I just pinged @stijnvanhoey above but would appreciate your filling our our🔗 SURVEY as well. and you are also welcome to join us on slack. Many thanks again for your time.

Hey there - @Roel, @stijnvanhoey, @pjhaest just checking in again here to make sure you recieved my message above! It's important that packages in our pyOpenSci ecosystem are maintained and that we have some level of communication with our maintainers. I think pydov is still being maintained.

But i'd like to know who the current maintainers are so please do get back to me here on this issue. And we'd GREATLY appreciate your time in filling out this survey as well.

Many thanks for your time in doing these things! 🙌

@lwasser thanks for reaching out. @Roel is indeed the current maintainer and doing all of the updates and new releases. Would be best to have @Roel filling in the survey.

Roel commented

Hi @lwasser

I'm indeed doing most of the maintenance of pydov currently. We had a bugfix release just a few days ago and a new major release is coming up soon!

I hope to find the time next week to fill in the survey.

ok awesome. @Roel i'll list you as the lead maintainer then for this package. Many thanks @stijnvanhoey @Roel for getting back to me!! and thanks in advance for working on the survey @Roel . I appreciate your time!

Roel commented

Hi @lwasser I filled in the survey, I hope the results will be useful for you!

@Roel thank you SO SO MUCH for that!! i appreciate it.