Sporadic issue with PyPI plugins fetch results in batch of plugins incorrectly marked as stale
DragaDoncila opened this issue · 4 comments
Description
Sporadically, fetching of plugins with the Framework :: napari
classifier from PyPI results in far fewer plugins than currently listed on the napari hub (for example the latest such occurrence resulted in just 160 plugins being returned instead of 360). This mismatch leads to the missing plugins being marked stale in Dynamo, and temporarily being removed from the napari hub. Typically, the next workflow run (5 minutes later) restores the missing plugins and they become available again.
First reported in this zulip thread, it appears this error occurs rarely (twice in September, twice in May and twice in February going back to February). We've opened a PR to silence the zulip stream for now to avoid the noise, but should look into whether there's a way to avoid marking the plugins as stale in this instance.
Thank you @DragaDoncila for reporting this and @kephale bringing it to our attention in Zulip. While this is not a critical bug that we will be able to address immediately, we have discussed and will have @DragaDoncila look into the feasibility of addressing this in the coming weeks and then regroup on potential implementation/fix. Once this is addressed, we plan to re-enable the zulip stream as well.
For transparency, I have added our bug prioritization definitions to this wiki page - https://github.com/chanzuckerberg/napari-hub/wiki/Bug-prioritization. I am labeling this a P2/ medium priority due to the fact that it is a sporadic enough issue that has yet to prevent users from finding/looking up napari hub plugins. However, since it has the potential to lead to a more critical issue, we are spending time investigating.
I believe the underlying issue here is that parsing the HTML from PyPI can be a bit unreliable. In terms of how to address it I think the best option is to verify against npe2api
's index of active plugin versions before dropping any plugins. The classifiers.json file at npe2api is updated by checking google bigquery's PyPI dataset for active plugin versions. This happens every two hours. Before dropping any plugins here, we should first check whether the plugin has active versions listed in this file. While this would potentially delay the removal of a deleted plugin (by a maximum of two hours), I think it's better to avoid spurious drops than to ensure a plugin is removed instantly. An alternative to this would be to check against bigquery ourselves, but that seems like a waste of resources when the npe2api
index is already being maintained (note that we would have to pay if we wanted to query the dataset more frequently than it currently is). A simpler bandaid type fix would be to add a guard so that when more than n
plugins are missing, we don't drop any of them - this is of course not foolproof and may still lead to some noise if we were to turn on the zulip stream again, but it would certainly be a simple fix.
I feel your recommendation of leveraging the existing classifier.json
would be the cleaner way to do it. We can update our documentation to reflect that any removal could take up to a max of 2 hours and 5 minutes.
One edge case I can think of for this is when we update a plugin to a newer version, we are "removing" the stale version. We would want the removal of the stale version to happen more or less around the same time we add the new version, so we only surface the latest version for a plugin. We can skip the sanity check for those cases, and remove the older version.
We would want the removal of the stale version to happen more or less around the same time we add the new version, so we only surface the latest version for a plugin. We can skip the sanity check for those cases, and remove the older version.
I agree. If we have a new version of a plugin we can skip the classifiers.json
check and just mark older versions as stale - this shouldn't lead to a zulip post about the older plugins being removed though.