Add a badge for GitHub/GitLab repository, and or issue tracker?
krassowski opened this issue · 3 comments
For a long time I was considering contributing to existing packages hosted on Bioconductor that I use every day. I believe in incremental improvements and would prefer to support the work of original creator rather than create a new package just because the existing one has bugs or imperfections.
However, I find it difficult to navigate the space from the developer point of view. As I just described in a tweet storm here, it is often difficult to know if a package has a GitHub/GitLab, and I believe that it is a valuable information. I see GitHub/GitLab (and other similar hubs) as extremely valuable, because in addition to hosting the git repository, they provide:
- bug tracking (with the bug status clearly defined and easier to explore than searching thought the Bioconductor support forum)
- other tools fostering collaboration, such as forking, pull-requests, automated actions
- easy to use interface for code exploration
and finally, they encourage best code practices by allowing for easy integration with linters, security scanners and other automated code intelligence tools
I was thinking, if it would be possible to gently encourage the existing packages to create GitHub/GitLab or similar, and make navigation from one to to other easier?
GitHub and Bioconductor badges
One idea would be to provide a badge linking from a package website to the GitHub and a second one from the reposities README to the package website. The former could be autogenerated, and the latter could be promoted by encouraging adding it in tutorials for maintainers (it could use an existing solution, e.g. badger, though I would prefer a version which is published to be shown rather than the downloads number).
For examples let's look at the top 5 Bioconductor packages:
- BiocVersion: package, GitHub - no link from one to the other, either way
- BiocGenerics: package, GitHub - no link from package to GitHub
- S4Vectors: package, GitHub - no link from package to GitHub
- IRanges: package, GitHub - no link from package to GitHub
- BiocGenerics: package, GitHub - no link from package to GitHub
And at the top 30 packages which do not constitute the core infrastructure:
- zlibbioc - core
- AnnotationDbi - core
- XVector - core
- BiocParallel - core
- GenomeInfoDb - core
- DelayedArray - core
- GenomicRanges - core
- SummarizedExperiment - core
- limma: package, no GitHub/GitLab etc - in vacuum people either refer to gravely outdated CRAN limma mirror (13 years old version!) or create their own mirrors, e.g. gangwug/limma
- Biostrings - core
- Rsamtools - core
- biomaRt: package, GitHub - no link from one to the other, despite the issues tracker containing important information about the state of the package
- annotate - core
- genefilter - core
- GenomicAlignments - core
- Rhtslib - core
- graph - core
- rtracklayer: package, GitHub - no link from one to the other, either way; moreover the search also returns an older mirror from @mtmorgan which might confuse at first; the issue tracker contains useful information that the user should be aware of
- edgeR: package - no collaborative platform like GitHub nor GitLub, or I could not find any
- GenomicFeatures - core
- BiocFileCache - core
- DESeq2: package, GitHub - this is exceptional, because the maintainer use the URL fields in both GitHub and package description to create a superb experience linking the two pages together; moreover the maintainer explained when to post an issue on the GitHub repo, and when to ask a question on the Bioconductor uspport forum
- Rhdf5lib: package, GitHub - another great example; both URL and BugReports fields are utilised
- geneplotter - core
- rhdf5 (same as Rhdf5lib)
Having two badges one from package website to the GitHub repo and one the other way round would help greatly here!
Contributions friendly?
A variation of this proposal would be to have a badge saying "welcoming contributors" or "contributor friendly" on the Bioconductor package site; this would signify that the maintainer opted-in to provide the repository address and will consider PRs with bug fixes and improvements.
Issues count badge?
The final variation, is to have a badge showing the number of issues open. I believe that this is very important, because issues can be discovered after a release and users should know what are limitations of the package; they should not have to read through all the the support forum questions and answers to discover that there is a bug that changes the result - this is not what I would expect a typical user to do just after installing a package. However, should they have a badge saying [7 issues], they might be inclined to check that.
I would emphasise that this badge would have a different purpose forum the existing "posts" badge which counts the questions and answers on the support website; a popular package might have thousands of usage questions, but only a few bugs at any given time. It is not important for the integrity of the research that the users read the 1000s of usage questions, but it is that they are aware of the few bugs which might or might not affect their use case.
Apologies if this is not an appropriate place to post this idea.
To give a concrete example, yesterday I mis-directed this post SamGG/ropls#2 attempting to describe a performance issue in ropls and asking if PRs would be considered (they would not as it is not the repo of the author - despite the author showing up as they do have a GitHub account!).
Another anecdotal evidence - there are other, often trivial issues like mislabelled figures in vignettes which I would have fixed, should there be an easy way to submit a fix; I even caught myself analysing the same mislabeled figure twice the same year which was a bit of a waste of time (for me but most certainly for others too!).
My not quite current clones of Bioconductor software packages shows
~/b/git$ find . -maxdepth 2 -name DESCRIPTION|xargs grep "^URL:" |wc -l
845
~/b/git$ find . -maxdepth 2 -name DESCRIPTION|xargs grep "^BugReports:" |wc -l
599
so maybe 1/3rd of Bioconductor packages do reference an external location; the URL is available in the Details: section of package landing pages. Certainly adding BugReports: to this section would be helpful
The core packages could be updated with URL / BugReports links via pull requests.
I'm not entirely sure about adding buttons, because I think there is value in users discussing use problems in a central location -- the support site. It seems like issues on repositories should really be limited to bug reports.
It seems like the way to 'encourage' this, at least in new package submissions, is to add a BiocCheck, perhaps generating a NOTE encouraging use of URL / BugReport. Again a pull request (on https://github.com/Bioconductor/BiocCheck) would be a good step.
I think a link to an existing GH (or similar) repository is certainly a good thing. Encouraging this in new package submissions as Martin suggested is an excellent idea.
However, IMHO, having a badge for the number of issues might be misleading as an issue is by no means always related to a bug. I usually add also issues to my (or other repositories) for feature requests or add ideas to improve functionality - mostly to have the idea written down and implement it at a later time point. So, using issues as a way to count bugs in a software might be totally misleading. Also, not having any issues does not mean that a software is free of bugs - in most cases
Finally, from personal experience, I found the Bioconductor community extremely open for collaborations and contributions. Even if I could not find a github repo to make a pull request (as in the case of ggbio
), an old fashioned, kind email to the maintainer worked.