Per Pod Metrics from Github
floere opened this issue · 8 comments
Add some metrics from Github:
- Add metrics table to existing trunk DB, with foreign key to pods table (In Trunk).
- Add hooks callback.
- Poll pod data (see this categorisation of @orta's list) on callback.
-
Offer API for this data, e.g. for sorted search on command line.Do this in a separate issue.
Reasoning: Callbacks from Github avoid polling costs/API limits. Metrics are on separate table and can be joined if a "thicker" model is needed, e.g. for search. Perhaps specific metrics should be on specific tables (e.g. github_metrics), that are joined with the pods table as needed (with aliases on columns, e.g. stars becomes github_stars).
Unsure yet about:
- Where to migrate the DB? (see @alloy's comment)
This might not be as easy as you think, we can't just arbitrarily add ourselves as web hooks on other people's projects. It may indeed have to be a polling job.
@orta Very true! So this only works for data which changes on changes to the spec - for example documentation. Or we could poll stars only when the repo is updated, favoring oft updated projects.
So: which are metrics that we can usefully poll when a spec update comes through?
IMO, we don't have to be massively "up to date", I'd say maybe we rotate through a batch of 100-200 pods a day pulling in fork & start data (and whatever else we're interested in )
If you want to know upfront what I'd be interested in, I think these are a good start around the github API related specifically to a library rather than on a per-pod-version ( e.g. read me complexity ) basis:
- Stars
- Forks
- Watchers
- Number of Contributors
Number of tags ( essentially number of releases)- Initial project commit date
- Number of pull requests alive
- Last Commit Date
Thanks for the list! One note: The number of versions we already can extract from trunk - there we have the exact number.
I agree we don't have to be perfectly up-to-date - the number of contributors could be polled only when we get a post-update-hook call, for example.
So it might be useful to have an "updated_at" field per metric - and then poll the oldest x metrics.
Or if a metric dows not change often, have a update_at field that gets further and further into the future if a metric does not change often to update it less often. But that's just some fun additional thinking.
👍 to all of ^
To categorize @orta's list.
Time-based (polling):
- Stars
- Forks
- Watchers
- Number of pull requests alive
- Number of Contributors
Commit-based (event via Github hook call):
-
Initial project commit date (once, if not already there) -
Last Commit Date - Number of Contributors (actually time based, but if they are contributing, then we can do it commit-based)
-
Number of releases(trunk does this already)
Where to migrate the DB?
In trunk.
Update re the last comment: CocoaPods/trunk.cocoapods.org#50 (Migrations not in trunk).