CocoaPods/metrics.cocoapods.org

Per Pod Metrics from Github

floere opened this issue · 8 comments

Add some metrics from Github:

  • Add metrics table to existing trunk DB, with foreign key to pods table (In Trunk).
  • Add hooks callback.
  • Poll pod data (see this categorisation of @orta's list) on callback.
  • Offer API for this data, e.g. for sorted search on command line. Do this in a separate issue.

Reasoning: Callbacks from Github avoid polling costs/API limits. Metrics are on separate table and can be joined if a "thicker" model is needed, e.g. for search. Perhaps specific metrics should be on specific tables (e.g. github_metrics), that are joined with the pods table as needed (with aliases on columns, e.g. stars becomes github_stars).

Unsure yet about:

orta commented

This might not be as easy as you think, we can't just arbitrarily add ourselves as web hooks on other people's projects. It may indeed have to be a polling job.

@orta Very true! So this only works for data which changes on changes to the spec - for example documentation. Or we could poll stars only when the repo is updated, favoring oft updated projects.

So: which are metrics that we can usefully poll when a spec update comes through?

orta commented

IMO, we don't have to be massively "up to date", I'd say maybe we rotate through a batch of 100-200 pods a day pulling in fork & start data (and whatever else we're interested in )

If you want to know upfront what I'd be interested in, I think these are a good start around the github API related specifically to a library rather than on a per-pod-version ( e.g. read me complexity ) basis:

  • Stars
  • Forks
  • Watchers
  • Number of Contributors
  • Number of tags ( essentially number of releases)
  • Initial project commit date
  • Number of pull requests alive
  • Last Commit Date

Thanks for the list! One note: The number of versions we already can extract from trunk - there we have the exact number.

I agree we don't have to be perfectly up-to-date - the number of contributors could be polled only when we get a post-update-hook call, for example.

So it might be useful to have an "updated_at" field per metric - and then poll the oldest x metrics.
Or if a metric dows not change often, have a update_at field that gets further and further into the future if a metric does not change often to update it less often. But that's just some fun additional thinking.

orta commented

👍 to all of ^

To categorize @orta's list.

Time-based (polling):

  • Stars
  • Forks
  • Watchers
  • Number of pull requests alive
  • Number of Contributors

Commit-based (event via Github hook call):

  • Initial project commit date (once, if not already there)
  • Last Commit Date
  • Number of Contributors (actually time based, but if they are contributing, then we can do it commit-based)
  • Number of releases (trunk does this already)

Where to migrate the DB?

In trunk.

Update re the last comment: CocoaPods/trunk.cocoapods.org#50 (Migrations not in trunk).