Star count is not enough to tell if the project is live or dead
arkanoid87 opened this issue · 9 comments
Nim packages directory lists items by name, star count and author. While star count gives an hint to the reader, it generally doesn't say if the project is live and kicking. Wouldn't it be better to list some other indices?
I'm not sure which number is best to represent project health, but I think it should be linked with the concept of time somehow.
We could just follow what github uses: the "insights" tab has many tools but the first one that github shows is an overview of the merged / unmerged pull requests and opened / closed issues in a time frame (eg 1 month), what about this? Or maybe the number of contributors?
The intention is not to rule out inactive projects, but just give a better view projects aimed to solve same problem.
Sometimes you want slow and solid, sometimes you need new with lot of momentum, sometimes you have to adventure with a a one-man project, but in all cases a general view is needed.
Software quality is a difficult metric to implement and an amount of PhD dissertations and other papers focused in it.
- Github "stars": is been shown to be a very unreliable measure of popularity and even worse measure of quality.
- Commit/PR/release activity: correlate poorly with maturity and quality. A project can receive infrequent updates because it's completed and mature, or abandoned but still perfectly usable, or abandoned and unusable.
- Subjective maturity rating from the author is often more reliable but unfortunately it's not required by Nimble.
- The package directory performs installation and documentation generation on each package as a simple proof of compatibility with a relatively recent Nim version.
- Linux distributions like Debian do thorough legal/security/quality review and vetting but the Nim ecosystem is not packaging-friendly.
I've been suggesting implementing more lightweight vetting in Nim/Nimble but nobody seems interested in doing it.
Perhaps showing multiple metrics (age, activity, SLOC count, PR counts) to the user is better than nothing. Of course multiple metrics are not comparable/sortable.
Any further thought?
I do agree that the subject is deeper than the rabbit hole, but I think here it would be sufficient to just add something more than just start count, without requiring changes on nim side and get the best from github.
But are all projects on github? I don't think so, but actually I have no idea.
But are all projects on github?
No, and different forges have different APIs; also Nimble supports both git and mercurial.
I don't have the numbers to tell how many projects would fall outside github insight tools, but if that is less than 50% I think is it worth it anyway
99% of Nimble packages are on gh... On the bright side, this makes the integration easier.
How difficult is to grab insight data? How ofter is the nim package directory updated?
Without an heuristic on how to measure maturity getting metadata from github is not useful.
Nothing's perfect, but I think star count + last updated commit would be a useful heuristic.
I think latest release version number, and its release date, is most practically useful. Upstream should be responsible for using version to communicate usability in production, and might be incentivized if projects like nimble use it as a data point. This gives a reasonable amount of data to make a decision on whether to investigate further or not, without having to go look at multiple dead repos before finding anything useful.
I personally don't care too much about stars, and commits can be super minor, and don't necessarily show enough commitment (no pun intended) to be useful, imo. If an upstream won't commit to releases, that in itself is an indicator to me. That being said: last commit is more useful when they aren't making releases, so maybe that could be used when releases don't exist.
0.2.0 or 0.0.2 from 4 years ago? dead, pre-production project.
1.4.4 from one year ago? possibly stable and maintained.
1.4.4 from 4 years ago? possibly stable, and usable in certain environments, but unmaintained.