pyOpenSci/software-peer-review

Defining a process for archiving / sunsetting pyos packages

lwasser opened this issue ยท 12 comments

In the astropy document related to our partnership, a question/ discussion came up as follows:

QUESTION - what happens if the package becomes maintained again - like they are out sick for 6 months, etc. the maintainer needs to ping us to ask if they can become a vetted package again (we will need a process for this documented)

i am going to copy the comments and open for discussion here as requested by @pllim . (so sorry i missed that comment in july somehow and just saw it today.

Length of time until we begin to wonder about a package being maintained or not?

How long would the "graceful sunset" period last? Some packages are small and stable and "nothing going on for 3 month" might not be a bad sign at all. Numpy is pretty good in backwards compatibility, so if nothing breaks, there might be no activity. If the sunset period is long enough (say 18 month = one major Python release cycle) then this is not going to come up often in practice.
I image that pyopensci would reach out in the sunset period before removing the listing and say "hey, we noticed that nothing is going on, is this still maintained?". If there is no answer, that's a bad sign, but the answer might be "It's stable, no new features are needed and no new Python version has be release that could have broken it". In that case, I would argue that it should not be sunset in the first place

My process has been so far -

  1. i watch the commit dates on the repo to note if there is any activity.
  2. I email after a year if i see no activity

@hamogu i'm continuing the conversation here. i've also discussed this a bit with @cmarmo - we talked about using language such as "archive" which is what @ropensci uses. But doing that - it feels a bit less "permanent" as in - if someone wants to pick the package back up, we could easily unarchive it.

we also talked about

  • adding an archived label to the review issue
  • listing archived projects on a separate page on the website so the work that went into the project is still recognized but we don't suggest that scientists adopted unmaintained tools if they don't have to!

Let me know what your thoughts are here so we can refine our policy!!

hamogu commented

I like "archive".
I still think a period of time with no activity needs to be defined (I suggest 1 year). We can monitor that with a script and, if a package fails that test, contact the maintainers (how? email Do we have contact information for maintainers for all packages? What if the maintainer has changed compared to when it was submitted for review by pyopensci originally? Issue in the repro?) and give them one month to answer. If they answer "no activity needed", what do we do next? Give them another year? Another 6 month? before contacting them again? Or do we initiate a quick review and see if the package still works with up-to-date numpy and Python versions?

What do we mean by "no activity"? Is it enough to push a commit? Merge a PR? Or does it have to have a release?

So, that's a lot more answers than questions. I suggest that for now almost any answer is a good answer. I would suggest to write down a short policy for "archiving", but I think what we really need to do is to gain real world experience of a few packages that go into that state and see how they look. Do they stop functioning? Do maintainers stop replying, but the code still works? On what time scale?

Then, we can come back in a year or two after archiving a few projects and use the experience form those real-world use cases to refine the criteria.

hey @hamogu ๐Ÿ‘‹
Here is some new and improved text based on our conversations.

All -- Please feel free to edit / comment etc!

New language here - comments welcome

Below is the language we have in our guide right now with a note from me that we plan to adjust this! and it looks like @NickleDave agrees with your suggestion above!

****** OLD LANGUAGE ****

If package maintainers do not respond in a timely manner to requests for package fixes, we will remind the maintainer a number of times. After 3 months (or shorter time frame, depending on how critical the fix is) we will discuss the future of the package as a part of our pyOpenSci ecosystem.

If a package becomes completely un-maintained we will highlight that fact and remove it as a vetted tool in our ecosystem.

If a sub-community decides to fork and maintain that package we are open to working with the new maintainers to keep the package within our ecosystem.

Note
Note from the Executive Director: Please note that we are reviewing the text below and will be updating our policy surrounding package quality and long term maintenance in the upcoming weeks (Fall 2022).

hamogu commented

From https://hackmd.io/ijK9EmoxTp6vGzVhOElvwA

403 Forbidden

You don't have permission to access this resource.
You could head back to home.

eeks sorry @hamogu i always get the permissions wrong with hackmd. it should work now.

pllim commented

The updated text in hackmd looks reasonable to me, thanks! I'll also ping @dhomeier and @WilliamJamieson in case they have comments as current Astropy Editors.

I like what yall have written. Hopefully this isnt redundant, Im curious about maybe splitting out time since last update and maintainer intention, which seems to be designed in a bit here? There seems to be a little bit of gray area to me about what "maintained" can look like. Eg. A smallish, well architected, no deps, limited scope package could be just fine receiving no updates indefinitely, right? And on the other end, you have packages like youtube-dl which need almost constant maintenance to stay working. So the intention of the maintainer doesn't map so neatly onto time of last update. Maybe we also consider whether they are still responsive to issues? But again in the case of the mythical "perfectly stable package" there might not be any!

I wonder about the labor entailed with editors following up indefinitely with an increasing number of packages too, sounds like that could be a lot of work.

So would it be possible to have one repo badge that is "maintenance intention" with some enum of possible values like "active," "maintenance," "best effort" (author will try but is no longer primarily.focused on it), "new maintainers wanted," "on hiatus" if dev is paused for eg. Illness or other reasons but will resume, or "archived" for actively paused development? Then we also have separate indicators like."time since last release" or "time to reply to/close issues" that are useful to know for potential tool users but not determinative of maintenance category?

After 6mo (or any time period), the maintenance badge flips to some undefined state that indicates that the maintainer needs to go set the state again. If things are normal, maintainer does that. Otherwise, after (unit time) it flips to auto-archive status. In the meantime, the undefined state is itself an indicator of maintenance status.

Just an idea! Sorry if its repetitive / doesn't fit

So would it be possible to have one repo badge that is "maintenance intention" with some enum of possible values like "active," "maintenance," "best effort" (author will try but is no longer primarily.focused on it), "new maintainers wanted," "on hiatus" if dev is paused for eg. Illness or other reasons but will resume, or "archived" for actively paused development? Then we also have separate indicators like."time since last release" or "time to reply to/close issues" that are useful to know for potential tool users but not determinative of maintenance category?

re: this point, we could suggest the use of repo status badges
https://www.repostatus.org/
although I haven't seen these widely used in the Python community (or anywhere tbqh)

this is great discussion - everyone! thanks! @sneakers-the-rat @NickleDave is the repo badge idea an automated thing?

We have one currently unmaintained package. in that case the author has been unresponsive so i don't think we can depend on an author that may have life issues that are bigger than their OSS responsibilities, always keeping something like that up to date.

ropensci has a server served badge i believe that they can control. we don't have that right now. BUT i can ask arfon about how JOSS manages badges.

from my perspective this would look something like this

  1. we have some duration of time that a package repo has no activity. the packages are updated bimonthly. so after some duration of inactivity they become internally flagged to us.
  2. We reach out to the authors idk maybe once a year with some sort of embedded form that says - "is this package still maintianed? Are there any new maintainers?. They fill it out and submit to us.

Some combination of an email response and repo activity data allow us to then flag a package as archived.

the website build has a conditional for packages that are archived and they get listed on a separate page.

so most of this COULD be automated potentially is my thought. and a maintainer could always "wake up" a repo and tell us it's maintained again.

thoughts???

so most of this COULD be automated potentially is my thought. and a maintainer could always "wake up" a repo and tell us it's maintained again.
thoughts???

That makes a lot of sense to me, especially so that we have some initial process in place. We can always revisit later with some of the ideas @sneakers-the-rat and I are suggesting. Sorry, I read the initial issue and then came back to this without re-reading, I didn't mean to go off on a tangent

totally. badges are just images after all, so if they embedded some image like ![badge](https://www.pyopensci.org/python-packages/mypackage/maintenance.svg) then that automation could be as simple as part of your jekyll build phase and then CI autoruns the build every day or something

@sneakers-the-rat so i follow the part where we serve an image. but where i'm confused is how would we change the state of the image? would we have a badge svg for every package - so individual files for each package and then we could somehow modify the code for the svg to change the color? is that what you're thinking. i do know what svg code looks like so i could imagine that type of workflow. and then the badge links to the console page for that package with all of the metrics and what triggered the change in color? is that what you're thinking?

colleagues - i have opened a PR that attempts to address all of this. I'm sure i missed things so can you all please have a look / review the PR when you have bandwidth?

#258

Let's give this a solid 3 week review period and if that is not enough time we can extend. so i'll make a note to revisit this pr one November 16th prior to the US thanksgiving holiday.

it seems to me that a server side badge would be really good to have.
I also have thought about that example of a tool that may need ver basic maintenance. it seems to me that unless it's like a pre-commit hook wrapper - the maintianer might still be updating things like dependencies, or pre-commit hooks, etc.

i am not sure but with stravalib we've had all sorts of things like a dependencies going into end of life status and needing to swap it out. pre commit hook updates(that sometimes break things). or dependency updates that impact our code and users. so it seems like a 6 month to a year check on "last commit" could be reasonable and we can adjust if need be. Anyway have a look and i'm really open to thoughts / ideas on this. we may just have to try something and see what breaks and what works in the process.