CocoaPods/metrics.cocoapods.org

Architecture and purpose

fabiopelosin opened this issue · 22 comments

Architecture:

  • small Sinatra App
  • running on Heroku
  • PostgreSQL DB
  • JSON
  • Simple metrics are computed directly and have a refresh rate policy (so the fetch date should be stored for each one)
  • Complex metrics (like doc coverage) are computed by other services asynchronously which will notify via a POST hooks.

API example:

Get a list of the Pods:

GET api.cocoapods.org/v1/pods
[
  "ARAnalytics",
  "ObjectiveSugar"
]

Get the information of a single Pod:

GET api.cocoapods.org/v1/pods/:pod
{
  "name": "ARAnalytics",
  "github_stargazers_count": 365,
  "readme_lines": 85,
  "screenshots": [],
  "documentation_coverage": 0.80,
  "cocoapods_score": 0.9
}

Provides some stats about the specs repo:

GET api.cocoapods.org/v1/stats
{
  "pods_count": 4449,
  "versions_count": 10000,
  "additions_last_30_days": 300,
  "average_documentation_coverage": 0.40
}

Open Questions

  • Where do we set the boundary with trunk?
jk commented

I'm not familiar with Sinatra, but wrote a REST framework in PHP. IMO you should think about some additional features up front:

  • Pagination
  • Versioning (you have this via URL v-prefixes, but this is considered inferior to media-type versioning - GitHub does this)
  • Representation formats (I think for this, JSON is enough. No need for XML but you can consider Collection+JSON)

Zend recently released its take on an API framework. And their primer section is pretty well documented. Perhaps that helps. For quick heads up I suggest: Hypermedia Application Layer, Versioning and Error reporting

I'm sounding like a SOAP advocate, but I learned it the hard way. Those standards make the development of an API client a lot easier.

So what is it’s purpose?

@irrationalfab Btw, I think this should still share the same DB with trunk behind the scenes. (Just like ‘search’ would.)

jk commented

@alloy To ease up client development and in the long run pave the way for easy new feature implementation since backward compatibility isn't such a big issue then.

@jk

Pagination

Good point! ‘All pods’ needs pagination.

@irrationalfab Although I’m wondering why this is needed in the first place? (An overview of all pods.)

Versioning (you have this via URL v-prefixes, but this is considered inferior to media-type versioning - GitHub does this)

Regarding versioning, I find media-type versioning inferior and don’t care about there being multiple end-points for the same resource. In fact, the whole idea is ludicrous to me, because with specifying a media-type you still create another endpoint, but you are stashing that info in a more ‘hidden’ place. This has always been a too academic discussion to me, until someone can give me a concrete example in my apps where a real problem would be solved by this, I’m against denormalizing such data.

Representation formats (I think for this, JSON is enough. No need for XML but you can consider Collection+JSON)

Yeah just JSON should be enough.

@alloy To ease up client development and in the long run pave the way for easy new feature implementation since backward compatibility isn't such a big issue then.

@jk Ah I wasn’t responding to you when I said “So what is it’s purpose?” :)

So to be clear: @irrationalfab what is the purpose of this app?

jk commented

Regarding versioning, I find media-type versioning inferior and don’t care about there being multiple end-points for the same resource. In fact, the whole idea is ludicrous to me, because with specifying a media-type you still create another endpoint, but you are stashing that info in a more ‘hidden’ place. This has always been a too academic discussion to me, until someone can give me a concrete example in my apps where a real problem would be solved by this, I’m against denormalizing such data.

@alloy Yeah it is a somewhat academic discussion. Essentially it is visibility vs. »one-resource-one-endpoint«. One can argue that one endpoint with several versioned media-types are analog to the version-prefixed endpoints. Personally I also go with the prefix-method, but I often considered to migrate away to media-types because the prefix-method has the taste of namespacing all endpoints under one one version number. For example: I got a service with a few hundred controller+action pairs and it's solid for several years. Now I want to break backward compatibility for one action and issue another version number (v1 → v2). Should all controller+actions pairs appear under v2 or is it sufficient to just release the new pair under a new version number?

Now I want to break backward compatibility for one action and issue another version number (v1 → v2). Should all controller+actions pairs appear under v2 or is it sufficient to just release the new pair under a new version number?

Yeah this is indeed a recurring thought issue. For philosophical sake, let me ask you instead, why would that be any different with a media-type version? As in, does the user have to remember to specify the right version with all their requests? That doesn’t sound helpful to the user at all from my pov.

jk commented

For philosophical sake, let me ask you instead, why would that be any different with a media-type version? As in, does the user have to remember to specify the right version with all their requests?

IMO it's an implementation detail. Poor man's solution would be to use filesystem directories for URL versioning. You can't do this with media query, so you will have more complex logic in place which enables you to version specific actions on the code-level vs filesystem-level.

I think we can both write an algorithm which converts between URL-versioning and media-type versioning. So this is turing-equivalent and only a matter of complying to the original RESTful manifest.

The search API does this, but mostly to offer convenience to various libraries, which make one or the other way harder to implement/to understand. (And of course because @kylef asked nicely ;) )

(this = offer both schemes)

orta commented

@alloy I figured you wouldn't want other services posting to the trunk's database?

Thus we'd have a separate project whose aim was to be the collection of associated metadata around a podspec and to call that the api. There could potentially be 4-5 separate web apps that all want to post data into the shared cocoapods db, ( e.g. cocoadocs, cocoapods-stats, testing, search, cocoapods-lists (literally my ideas go on forever.))

I figure we can keep that stuff from auth, and keep trunk away from all that?

@jk Agreed. I prefer to stick with the URL for now because it’s much simpler towards the end-user of the API.

@orta True, I don’t want clients to post to trunk for the purpose of other services, but that doesn’t mean that our services can’t share the DB behind the scenes ;) For now I don’t see a point in making these all completely distinct apps and having to deal with syncing data via HTTP or what have you.

orta commented

BTW, would recommend grape we use it at artsy and we can bug @dblock for questions anytime.

@orta Interesting, I’ll take a look at it. But let’s start out with a shared DB for now nonetheless.

@dblock @orta Very nice and comprehensive README!

Thanks for plugging Grape @orta :)

Some 0.02c:

  • Start with a header-based versioning scheme, not /v1 in the URL. Grape for example will handle that automagically for you. Will let you easily roll out v2 when that is needed.
  • Build a Hypermedia API (JSON schema that has no duplication and links to child resources instead of embedded child resources).

Start with a header-based versioning scheme, not /v1 in the URL. Grape for example will handle that automagically for you. Will let you easily roll out v2 when that is needed.

Yeah I read that in your README. While that’s very nice, my issue is not about it being hard or not for us to roll out, but for the ease of use towards clients. Having one field is simpler than two.

Just stay my "i told you so" for later ;) Consumers don't have to do anything to consumer your default API version unless they specifically want to consume something else.

This issue is pretty much implemented – I suggest we close it and open further specific issues. Ok, @irrationalfab @alloy ?