pyOpenSci/software-submission

afscgap: Ocean health tools for NOAA AFSC GAP species presence data

sampottinger opened this issue · 74 comments

Submitting Author: Sam Pottinger (@sampottinger)
All current maintainers: @sampottinger, @gizarp
Package Name: afscgap
One-Line Description of Package: Community contributed Python-based tools for working with public bottom trawl surveys data from the NOAA Alaska Fisheries Science Center Groundfish Assessment Program (NOAA AFSC GAP).
Repository Link: https://github.com/SchmidtDSE/afscgap
Version submitted: 0.0.7
Editor: @ocefpaf
Reviewer 1: @7yl4r
Reviewer 2: @ayushanand18
Archive: DOI
JOSS DOI: DOI
Version accepted: 0.0.7
Date accepted (month/day/year): 05-30-2023


Code of Conduct & Commitment to Maintain Package

Description

Python-based tool set for interacting with bottom trawl surveys from the Ground Fish Assessment Program (GAP). This provides information about where certain species were seen and when under what conditions, information useful for research in ocean health.

It offers:

  • Pythonic access to the official NOAA AFSC GAP API service.
  • Memory-efficient tools for inference of the "negative" observations not provided by the API service but required for some common types of analysis.
  • Visualization tools for quickly exploring and creating comparisons within the dataset that provide an "on-ramp" to deeper analysis. The visual analytics components aim to serve both expert programmers and audiences with limited programming experience.

Note that GAP is an excellent dataset produced by the Resource Assessment and Conservation Engineering (RACE) Division of the Alaska Fisheries Science Center (AFSC) as part of the National Oceanic and Atmospheric Administration's Fisheries organization (NOAA Fisheries). See also the RACEBASE NOAA InPort entry.

Additional information at https://pyafscgap.org/.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

related to data viz category: see presubmission
Domain Specific & Community Partnerships

  • Geospatial
  • Education
  • Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

This library supports retrieval of data from the official NOAA AFSC GAP REST API service but, as that service on its own is often not sufficient due to certain data limitations, it also offers zero catch inference as required for a number of common types of investigations (hence data processing). Finally, given the needs of the community and the vast breadth of the dataset, it offers a community application to explore the data which, in turn, can generate Python code to help users get started with continued analysis within their own scripts.

  • Who is the target audience and what are scientific applications of this package?

This project largely benefits scientific researchers in the ocean health space as this dataset is useful for fisheries management, biodiversity research, and marine science more generally. An example of what this analysis may look like is provided in our example notebook hosted on mybinder.org.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

We are not aware of other Python packages working with the AFSC GAP dataset.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Please see our presubmission.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.

See NOAA InPort entry. We are primarily using Distribution 6 in the API form. As described, these data are available with the following access constraints: "There are no legal restrictions on access to the data. They reside in public domain and can be freely distributed."

See BSD License within repo.

  • contains a README with instructions for installing the development version.

See project README.md.

  • includes documentation with examples for all functions.

See project website and usage section of project website's documentation microsite.

  • contains a tutorial with examples of its essential functions and uses.

See tutorial notebook on mybinder.

  • has a test suite.

See main tests for package and supplemental tests for website (backend tests and frontend tests).

  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

See CI and CD (library, documentation).

Publication Options

JOSS Checks

  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.

See example notebook and app intro.

  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.

In addition to query compilation / emulation, we are hopeful that the negative record inference and complementing visualization tool are enough to escape the thin API client status. Coming in at over 10k lines of code, we would seem to fit some of the "hard" criteria that the journal puts forward. Though this library happily provides Pythonic access to an API service, we believe that this contribution as a whole goes beyond a thin client by providing one of the only public mechanisms for conducting investigation requiring negative catch data and provides unique tools for comparative analysis within with the dataset.

  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.

See https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.md (or PDF preview at https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.pdf).

  • The package is deposited in a long-term repository with the DOI:

We have submitted to Code Ocean. Please confirm if this will suffice. See 10.24433/CO.4905407.v1 / https://codeocean.com/capsule/4905407/tree/v1

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

We would be delighted to have your pull requests! 🎉

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

  • Last but not least please fill out our pre-review survey](https://forms.gle/F9mou7S3jhe8DMJ16). This helps us track
    submission and improve our peer review process. We will also ask our reviewers
    and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

This is a continuation of #91

Sorry updated issue body to fix a small formatting issue and to clarify that we have two maintainers.

No worries, edit as much as you need -- we will end up updating the metadata in the YAML header as we go anyway.

I will do initial checks and start finding an editor for this review early next week.

Fantastic thank you! Have a great weekend :)

You too! 🙂

Hi again @sampottinger I'm adding the initial editor checks below.

Good news! afscgap passes with flying colors. I made a couple of comments in the checklist but I don't see anything that needs to be addressed--looks like we're ready for review.

I will begin looking for an editor this week and update you here as soon as we have found one.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • might be worth adding a conda-forge package?
    • The package imports properly into a standard Python environment import package-name.
  • Fit The package meets criteria for fit and overlap.
    • yes, data retrieval (from GAP dataset through API), data processing, data visualization
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a Code of Conduct file.
      • adapted from p5.js COC
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
      • BSD 3-Clause
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

Incredible @NickleDave thank you so much!

Hi @sampottinger and @gizarp, very happy to say that @ocefpaf has generously volunteered to act as editor for this review. He will introduce himself and take the lead from here.

Thanks @NickleDave and great to meet you @ocefpaf!

Hi folks! Sorry for the delay in getting here. (Holiday+longweekend in South America this past few days.)

@NickleDave I'm catching up with the reading today. Please let me know if there are any urgent button/forms I need press/fill.

@sampottinger I have a few really minor comments after a first pass. These are not blockers and you don't need to address them for this application! They are really just comments:

  1. I see you have a module named utils. That can be problematic as we tend to put everything in there. It also make future refactors or even maintaining the code a bit hard. Most of the functions there seems to be "converters," right? Maybe renamed it to converters.py (horrible name, I know) and avoid the "utils.py" trap.
  2. The query method argument list is daunting! I'm not sure if there is a way around it. I know you are limited by the data structure upstream. Maybe it is a matter of creating more examples and docs? Or other methods with a reduced search scope that calls the main one? Not sure if that even makes sense.
  3. The package name suffers from the usual NOAA acronym problem. I understood it but I am "in the know" b/c I'm a NOS/IOOS/SECOORA contractor (see ;-p) . Maybe a friendlier name can help folks outside of NOAA-Fisheries find this package. However, if there is a user base, even a small one, please disregard this comment. Renaming a package after folks adopted it is worse than a bad name.

PS: I showed your package to a few colleagues at IOOS and they believe that a notebook with an afscgap example would make a nice addition to our Code Lab https://ioos.github.io/ioos_code_lab/content/intro.html. Let me know if you are interested in that.

Hey @ocefpaf! Thanks so much for all of these. Let me pull together a PR real fast and some comments!

Thanks again @ocefpaf for your thoughtful comments! Please see below.

I see you have a module named utils

This is a good suggestion. I made a PR at SchmidtDSE/afscgap#49. Let me know if this refactor looks good to you!

The package name suffers from the usual NOAA acronym problem ... if there is a user base, even a small one, please disregard this comment

Yeah the dataset's whole name is NOAA AFSC GAP RACEBACSE FOSS 😅 . Thanks for being understanding on this one. I think we will politely decline this suggestion if that's OK. We have some community continuity concerns like you mentioned and some community sensitivity concerns we want to respect... we have to be a little careful around not appearing to have activity in other datasets (NOAA or otherwise) that may be at different moments in their open science journey than AFSC GAP or not (yet? 🤞) open in the same way that the AFSC GAP API service is. Given our current context, this specificity is probably desirable for at least the medium-term future of the project.

a nice addition to our Code Lab

Amazing! We'd be honored 😄. We'd be more than happy to co-develop something with you or another community member in that space. Just for the conversation, I'll highlight that we do have a mybinder example available (source in repo). Totally understand if it might not be quite the right vibe though.

The query method argument list is daunting!

I think that design choice is a strong opinion loosely held. I'm not sure if it is helpful but I'll document some thoughts we had internally but 100% open to additional ideas!

We went this route to fulfill a few goals: we wanted to reflect the structure of the API service (any combination of filters in the keyword set is valid in the NOAA API service so there isn't a clear functional decomposition) while also providing documented discoverability. There were some other ideas explored:

  • kwargs: Leaving the keyword arguments as variable is a bit easier for implementation but, though documentation for the AFSC GAP dataset exists, it's not necessarily convenient to reference / use from a Python context. So, we wanted to have docstring description for all of the available options to benefit IDEs, API doc generation, etc. We also wanted the possible list of filters to be enumerated for the user in those contexts as well.
  • pass objects: There's the option of a more ORM-like approach where a list of filters are passed like query(filters=[CommonNameFilter(eq='Pacific cod')] but we didn't like forcing a lot of our type system into client code as it would require quite a large number of types to maintain our goals around docstring coverage of the available filter parameters and discoverability.
  • builder chaining: Probably my personal favorite, we could turn this into an object with syntax like query().commonName(eq='Pacific cod').year(min=2021).get(). While I personally like this syntax, there's been conversation internally and with community partners for using this library in an educational context as well. As this syntax may be less common in earlier Python programming journeys, there was a feeling that the keyword arguments approach was more inclusive / approachable for someone in, say, an introductory Python course or a student not in a computer science curriculum (we got invited for one coming up soon!).

Just wanted to list this out but absolutely open to other thoughts! This just felt like the right balance around the broader goals with regards to documentation / discoverability, keeping the type system encapsulated (see below), and maintaining accessibility.

library

This is a good suggestion. I made a PR at SchmidtDSE/afscgap#49. Let me know if this refactor looks good to you!

Better than I thought. Hopefully that will make your life easier in the future.

I think we will politely decline this suggestion if that's OK.

No arguments here! It is better to keep the name now that the package is out.

Totally understand if it might not be quite the right vibe though.

It is a good fit. I'll ping you outside of pyOpenSci later to talk more about it.

Regarding the query functionality. I do like the idea of the builder chaining and it is not too uncommon. Pandas, pyjanitor and many other mainstream libraries use it. It is quite common in the R world too. Also, many similar packages, like argopy, have something like that.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

PS: I disagree with the learning curve there. Maybe it is my bais but I teach a lot of basic Python and I make sure to expose the students to a flat vs nested model for operations.

👋 Hi Tylar Murray and Ayush Anand! Thank you for volunteering to review
for pyOpenSci!

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: May 4th, 2023


Reviewers: @7yl4r and @ayushanand18
Due date: 2023-05-04

PS: You can comment, open issue, and/or PR directly in the afscgap for this review. Just make sure to cross-reference them here.

Thanks @ocefpaf!

Better than I thought.

Awesome! Merged in.

I disagree with the learning curve there.

That's helpful to have your perspective on that. Let me float this around a little bit and see how we feel about migration. I'll try to make a call on that in the next day or so to keep things moving.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

Thanks!

wave Hi Tylar Murray and Ayush Anand!

Lovely to meet you both! Thank you for your time. :D

Thank you so much @ocefpaf and @sampottinger!
Great work building this package. The current implementation looks robust and clean.

But I do have some observations to make.

  1. There is this function afscgap.query().get_bottom_temperature_c(), and I do think that it might be hard to guess the function name for someone who hasn't seen this dataset before. Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')? This observation is similar to other functions like cpue_*(), duration_hr, etc. I think could have been derived from the direct field names in the dataset, and those who know about the dataset can easily figure out what the name would be, without looking at the docs, but it would be difficult for people from outside (like me) to figure out without going back-and-forth to the docs.

  2. Also about the filters and getters, we can utilise the getitem() method in the class with something like this,

def __getitem__(self, key):
    return self.__getattribute__(key) 

This might be more intuitive. Or to make it much easier can we can just go by the name get(key), and specify the key we want. Ref

I would love to hear your opinion, and these are surely non-blockers for this review!

Hello @ayushanand18 and @ocefpaf!

I do like the idea of the builder chaining and it is not too uncommon

Thanks again for the conversation @ocefpaf! I've started implementing the builder pattern at SchmidtDSE/afscgap#50.

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks for the suggestion @ayushanand18! I've started that at SchmidtDSE/afscgap#51

we can utilise the getitem() method in the class

Thanks for this suggestion but I may pass on this if that's ok @ayushanand18.

In light of the stuff we did earlier in process, our reasoning is that the library carries an explicit goal to provide docstrings and type annotations for these fields. I think, given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem, I want to leave those methods there for discoverability and encourage use of those typed interfaces unless the user "opts out" first.

To that end, there is a to_dict method on afscgap.model.Record for those that prefer that key-based approach or want to read into something like pandas.DataFrame but we hope requiring to_dict first helps inform the user that the object returned from a query does offer those other interfaces, information which may be missed if they simply try treating the result as a dictionary. In other words, I'd rather have the user first explicitly decline use of the default typed / strong interface given the benefits it offers for static checking and affordance.

Hello all! @ocefpaf, the builder interface was implemented in SchmidtDSE/afscgap#52.

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks @ayushanand18! I have implemented in SchmidtDSE/afscgap#55

I am in the process of confirming SchmidtDSE/afscgap#54 which includes the following requested changes before they are resolved on main:

SchmidtDSE/afscgap#54 will also soon include updates to documentation.

I've started that at SchmidtDSE/afscgap#51

That looks great!

Thanks for this suggestion but I may pass on this if that's ok .

It's okay. At the end of the day, we need to follow what's best for the user of the package.

To that end, there is a to_dict method on afscgap.model.Record for those that prefer that key-based approach or want to read into something like pandas.DataFrame [...]

Sounds good. But that leaves me a question of whether the afsc.model.Record object returns a nested dictionary in any case with the to_dict() method. If the answer is yes, then probably we might include a function to_pandas() that unnests the nested dictionary and returns a DataFrame object so that user doesn't take the pain. But again this is just a suggestion.

given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem

I hear you. My day job is to try to make that better and, most of the times, we fail :-/

I hear you. My day job is to try to make that better and, most of the times, we fail :-/

@ocefpaf - I'm sorry I didn't mean it like that! Perhaps just that the docstrings are extra convenient in this context 😅 . The upstream documentation for the datasets have to address both internal and external users with different kinds of access permissions / modalities. It just means it's a big schema with lots of (great) data but also takes a little while to get oriented towards. I think / hope that the docstrings help a bit but this library also has a simpler task since it knows which dataset the user is working with. Also, since I know they are in Python, I can also do a few things to help the user that the upstream service can't do.

Unrelated but thanks for your comment on the issue in the repo about the builder pattern!

That looks great!

Thank you! Appreciate your suggestions @ayushanand18.

object returns a nested dictionary in any case with the to_dict() method

It is not nested! It uses the same field names and linear record structure from the API service.

Hello all! Your requests are implemented on main and the issues in the library are closed. Thanks for your help!

@ocefpaf - I'm sorry I didn't mean it like that!

No harm done! I was mostly venting my own day-job frustration. Like in Software, agreeing on standards is hard!

Most of the time, even when we do agree on them, we only see its flaws when we start to implement Software to use them. By that time it is too late and now we need to circumvent the issues with Software and/or metadata patches.

@ayushanand18 thanks for all the feedback!

@7yl4r do you believe you can take a look at this project in the next few days? If not, please let us know!

PS: Don't forget to close your review with the reviewer template. See #93 (comment).

@ayushanand18 thanks for all the feedback!

Thanks @ocefpaf, I'm done with my comments on the review.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 0.5


Review Comments

Thanks for the submission @sampottinger! Everything looks good. However, my only concern is of including Installation Instructions for the dev version of the package through GitHub.

Hello @ayushanand18! Thanks very much for your review.

However, my only concern is of including Installation Instructions for the dev version of the package through GitHub.

I’m sorry are you recommending that we include additional installation details for installing direct from GitHub (eg: pip install git+https://github.com/SchmidtDSE/afscgap.git@main)? Just wanted to double check.

Also @ocefpaf / @ayushanand18 - we were hoping to try at JOSS (we know they don’t like API wrappers but we are hoping that zero catch inference, ORDS compilation / emulation, visualization, etc take us over the line). The paper.md is in the inst folder. LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.

Sorry @ayushanand18 - Added this PR to address your final comment: SchmidtDSE/afscgap#72. LMK if this addresses your concern. Thanks again very much for your time! We very much appreciate all of your help.

LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.

@lwasser should the PyOS reviewers fill that part or do you get someone from JOSS doing that?

@ocefpaf this is a great question. JOSS does not normally check those boxes. I think typically the submittor / author would want to check those to indicate that they have all of the items needed by JOSS. but JOSS will review the paper - we do not need to review the paper. It is my understanding that JOSS will also double check scope. But it is good for the author to check first. I hope this helps. feel free to ask more questions if it's unclear as we can always adjust our processes if we need to.

@sampottinger thank you for submitting to us and @ayushanand18 @7yl4r so happy to have you here as reviewers! 👋

Thank you @lwasser !

I’m sorry

No need to be sorry on that, I should have been clearer in my comments :)

are you recommending that we include additional installation details for installing direct from GitHub

Yes, I meant that. The PR looks good, and I don't have anything to add.

Thanks @ayushanand18 !

Hello all! Just checking in. Is there anything else I can do to help in the process? Thanks!

7yl4r commented

Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things. All looks great. There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.

Thanks! Looking forward to your feedback @7yl4r!

Hello @7yl4r! Thank you for SchmidtDSE/afscgap#75. I have responded there but I was unable to reproduce your behavior in a fresh Ubuntu VM. Just checking in... is there anything else I can help with at this time? Thank you!

There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.

Please feel free to ping me directly or open issues in the GH repo for the docs. Documentation improvements are always very welcomed :-)

Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things.

When you are done please paste them as a comment here. @lwasser will aggregate the reviews later and having them in the original issue helps with the process.

I have addressed and closed SchmidtDSE/afscgap#75. Thank you for your help!

thank you all for being here and for using the templates!! @ocefpaf @7yl4r it is SOOOO very helpful to us to have consistent templates for many reasons. so i truly appreciate your doing this and pasting them here. you can still add any adhoc review text that you wish to add to the template. ✨

Thanks everyone! I don't believe there are any tasks / unresolved feedback pending on our end right now. Please let me know if there is anything else we can do. 🥳

7yl4r commented

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name

  • Badges for:

    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).
  • Short description of package goals.

  • Package installation instructions

  • Any additional setup required to use the package (authentication tokens, etc.)

  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.

    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.

  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.

  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2


Review Comments

Just a few general comments on my experince digging into this package:

  • The README is long; I have a personal preference for a very terse README that links out to the heavy documentation.
  • Badges should technically be above the package description according to review template, but I don't see this as important.
  • Overall I found the documentation to be extremely detailed - perhaps overwhelmingly so. Ensuring the quickstart examples and notebooks are front and center is very important. The readme already does a good job of this, but perhaps links out to more usage examples would be beneficial.

@sampottinger I move to the next stage, "awaiting reviewers changes" but I do believe you implement all of the suggestions already, right? @7yl4r and @ayushanand18 is there anything pending on your side?

7yl4r commented

Everything looks good on my side. I can tell a lot of great work has gone into this project and it is in good shape.

Thank you all! @7yl4r - I want you to know that I took your suggestions to heart and I am working on a PR to address them at SchmidtDSE/afscgap#81. I should have that resolved later today.

@ocefpaf - let me let SchmidtDSE/afscgap#81 through and then I should be good to go.

We are all set. Please let me know how you want to proceed @ocefpaf. Thanks @7yl4r for your feedback! The README is refactored and the docs microsite reorganized at https://pyafscgap.org/docs/#

@ayushanand18 is there anything pending on your side?

Nothing, everything's in perfect shape. Thanks for giving me a chance to be here!

👀 (Getting some good feelings) Did we get through to the end? 😸

This application did complete all the steps so far and I do believe it is time for step 6.

@lwasser and @NickleDave sorry to bug you but I'm not sure I should add the level 6 label myself or if one of you should do it.

Hey all! Sorry just checking in before some travel. Do you need anything else from us at this stage? Thanks!

@sampottinger looks like we are OK to move to the last step. I'm just checking with someone who has more experience than me b/c this is my first time as an editor here. I'll have some feedback for you soon.


🎉 afscgap has been approved by pyOpenSci! Thank you Samuel Pottinger for submitting afscgap and many thanks to Ayush Anand and Tylar for reviewing this package! 😸

There are a few things left to do to wrap up this submission:

  • Activate Zenodo watching the repo if you haven't already done so.
  • Tag and create a release to create a Zenodo version and DOI.
  • Add the badge for pyOpenSci peer-review to the README.md of . The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number).
  • Add to the pyOpenSci website. , please open a pr to update this file: to add your package and name to the list of contributors.
  • if you have time and are open to being listed on our website, please add yourselves to this file via a pr so we can list you on our website as contributors!
It looks like you would like to submit this package to JOSS. Here are the next steps:
  • Login to the JOSS website and fill out the JOSS submission form using your Zenodo DOI. When you fill out the form, be sure to mention and link to the approved pyOpenSci review. JOSS will tag your package for expedited review if it is already pyOpenSci approved.
  • Wait for a JOSS editor to approve the presubmission (which includes a scope check).
  • Once the package is approved by JOSS, you will be given instructions by JOSS about updating the citation information in your README file.
  • When the JOSS review is complete, add a comment to your review in the pyOpenSci software-review repo that it has been approved by JOSS.

🎉 Congratulations! You are now published with both JOSS and pyOpenSci! 🎉

All -- if you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the contributing-guide. We have also been updating our documentation to improve the process, so all feedback is appreciated!

Yay! Thank you everyone. I'll get to work on this final bit right now and forward to JOSS.

Thanks for your help @ocefpaf! I got everything done today except the JOSS submission form. Things should be good to go on the pyopensci side though. I'll try to submit to JOSS tomorrow. Thanks!

congratulations @sampottinger and welcome to our ecosystem of packages!! 🚀 i'll make sure your pr's are merged today so you get added to our website!! ✨ please keep us posted on the joss submission as well as we will update our tags. we also appreciate referencing this issue when you submit so they two reviews are connected (and they know not to actually re-review your package!!). @ocefpaf thank you for leading a really efficient review! and @7yl4r @ayushanand18 thank you for reviewing for us!!! we will add you to our website as contributors as well.

last request: if you all have the bandwidth to take our post review survey id GREATLY appreciate it. we use this feedback on the review to track package improvements and also to just get feedback so we can improve over time. thank you in advance for doing this.

WIll do @lwasser !

Filled out the survey and submitted to JOSS. I added this URL in the notes to the editor and, when the issue opens on their side, I'll link. Thanks everyone!

@sampottinger let me know if you need any help with the zenodo DOI and badge. Those are the last boxes we can tick in #93 (comment) that do not require waiting on JOSS :-)

Oh shoot. Sorry let me sort that...

Added!

hey there @sampottinger ! i wanted to invite you and your maintainer team to write a blog post (totally optional) on your package for us to promote your work! if you are interested - here are a few examples of other blog posts:

pandera
movingpandas

and here is a markdown example that you could use as a guide when creating your post.

it can even be a tutorial like post that highlights what your package does. then we can share it with people to get the word out about your package.

If you are too busy for this no worries. But if you have time - we'd love to spread the word about your package!

Hello @lwasser! Thank you for the invitation to do this and we would be very happy to participate. I'll get to work on this but there's a chance I may have to turn it back to you late next week. Will keep you posted!

wonderful! thank you @sampottinger . Also no rush. Late next week is perfect. please reach out with any questions.

Hello there! Sorry we may be just a bit longer on the blog post. More soon! Thanks!

woo hoo!! congratulations @sampottinger @gizarp !!! i'll let @ocefpaf do the honors of wrapping things up and closing out this issue. but i just wanted to celebrate here as it's well deserved!! 🎉

Congrats @sampottinger. Great work! Also, as someone who work with similar things (making sure the data reaches those who need them) I'm very happy to close this as completed and to see it on JOSS.