jgehrcke/github-repo-stats

Feature request: Include total number of downloads for all releases of a repository

GruberMarkus opened this issue · 5 comments

It would be great if github-repo-stats would not only store data about views and clones, but also data about the total number of release downloads for a specific repository.

There are badges for it already, for example https://img.shields.io/github/downloads/grubermarkus/set-outlooksignatures/total, but the numbers only cover the last 30 releases due to a GitHub restriction.
Your github-repo-stats code could help overcome this restriction.

To get download numbers of your assets (files attached to the release), you can use https://developer.github.com/v3/repos/releases/#get-a-single-release ("download_count" property of the items of assets list in response). Pagination is to consider.

data about the total number of release downloads

Certainly a great idea worth evaluating! Thanks for bringing this up here.

Looking at https://developer.github.com/v3/repos/releases/#get-a-single-release indeed an important question is whether the download count should be tracked and persisted over time per-release and per-asset, or if some irreversible aggregation should happen before persisting the data.

Bigger projects accumulate O(100) releases over time, each having O(10) assets. That would be O(1000) individual timeseries. Of course we could simply track these in individual CSV files in a distinct directory (that wouldn't actually hurt too much). We could display a meaningful aggregate by default in the report. The raw data would allow for much more fine-grained analysis.

Individual CSVs with detailed detailed data and aggregate data in the report (with a drill-down, maybe) sound promising.

ychin commented

I think for maximum flexibility, storing per-release, per-asset download count over time seems best.

Per-release is important because one particular thing I'm thinking of investigating (hence noticing this repository) is how fast people upgrade to a new release and it's quite important to distinguish between the new and old releases. Aggregating the total download count isn't useful because a new update would automatically induce people to be downloading the new version instead of organic downloads. Per-asset is also important to distinguish because different assets often times serve different purposes and just summing them up could lead to misleading or useless information.

Not sure how the aggregation should work though. I think it's going to be repo-specific. You could provide a way to customize it or just say "here's a CSV file and analyze it yourself with your own script".

Thank you for that super valuable feedback @ychin!

ychin commented

FWIW, just for reference I implemented something like this for my own use since I want to start tracking download / installation counts for my project before I push a new release out. It also tracks Homebrew installs though so it's not strictly GitHub-only (a more generic solution could easily get out of hand with different package managers), and I just have it generate CSV files for me since I don't need the web visualization (I just import the CSV files to Google Spreadsheet using IMPORTDATA to take advantage of its visualization / charting tools).

The repo is at https://github.com/ychin/macvim-download-stats