pyOpenSci/software-peer-review

Current domain/scope for Python packages on pyOpenSci

arianesasso opened this issue ยท 18 comments

Hey everyone!

I think there has been some discussion on the categories under domain and scope in different threads. Therefore @lwasser suggested we create a new issue.

The current domains listed are:

  • Data retrieval: Packages for accessing and downloading data from online sources. Includes wrappers for accessing APIs.
  • Data extraction: Packages that aid in retrieving data from unstructured sources such as text, images, and PDFs.
  • Data munging: Tools for processing data from scientific data formats.
  • Data deposition: Tools for depositing data in scientific research repositories.
  • Reproducibility: Tools to scientists ensure that their research is reproducible. E.g., version control, automated testing, or citation tools.
  • Geospatial: Packages focused on the retrieval, manipulation, and analysis of spatial data.
  • Education: Packages to aid with instruction.
  • Data visualization: Packages for visualizing and analyzing data.

I personally think Geospatial is too specific since you could also cite other domains, e. g. Health. But I would still consider education as its own (but @lwasser pointed out that this might also be too specific). Discussion here.

And @eriknw brought out the topic of what "Data munging" really is here.

So, maybe we can use this new issue to discuss that? ๐Ÿ˜Š

thank you @arianesasso gosh this list is old!
so i can see how "tools for processing data from scientific data formats" is really vague. i mean what is a scientific data format? lidar is a data format but also used for commercial applications etc.

Would

  • data processsing & "munging": tools that help users work with different data formats and structures be any better? or similarly vague?
  • Examples: Numpy, xarray,

The problem we have now is that our packages don't fully span the list so we would want to include examples that we haven't reviewed. but maybe we could list major packages in some cases?

NOTE: that we have purposely not listed analytics tools here yet - (but we have reviewed tools that support analytics)

Pitching in and capturing some conversation from Slack:

Part of what might be happening now when people read these categories is that they say to themselves, "well I did not write a data munging package so I'm not in scope".

But all of the categories fall under the broader umbrella of "open science".

My sense is that people in Python world are less familiar with the idea that you would need a separate effort focused on open science tools. Not because we don't know about it! Rather, because it's kind of our default mode of operation as a glue language.

I know that is by no means our only goal, but it's one of the things that seems to be lost in translation. This is why people keep asking "how are you different from that scientific Python group?"

Isn't our intent with this section to give examples of functionality that is considered in scope?
Not to say "these are the only eight types of packages we care about."

So maybe one thing we could do at the top of that section is say something like:
"One of the overarching goals of pyOpenSci is to facilitate open science. If your library meets that goal, then you are likely in scope. Here's some common categories of tools that help make science more open ..."

@all-contributors please add @arianesasso for code, review and design

@lwasser

I've put up a pull request to add @arianesasso! ๐ŸŽ‰

I like @NickleDave approach to focusing more on the open science part and mentioning the categories more like examples than a box to be in. In that sense, people could describe how their package contributes to open science instead of picking a category.

this will be closed by #162

@all-contributors please add @NickleDave @stefanv for code, review, design

@lwasser

I've put up a pull request to add @NickleDave! ๐ŸŽ‰

@all-contributors please add @stefanv for code, review, design

@lwasser

I've put up a pull request to add @stefanv! ๐ŸŽ‰

@all-contributors please add @eriknw for code, review, design

@all-contributors please add @eriknw for code, review, design

@lwasser

I've put up a pull request to add @eriknw! ๐ŸŽ‰

https://github.com/all-contributors please add @Batalex for code, review, design

@all-contributors please add @Batalex for code, review, design

@lwasser

I've put up a pull request to add @Batalex! ๐ŸŽ‰

@all-contributors please add @cmarmo for code, review, design

@lwasser

I've put up a pull request to add @cmarmo! ๐ŸŽ‰

ok this issue was closed by a merged PR but never actually closed. :) officially closing it now months and months later ๐Ÿ˜† perhaps a year later actually.