Anahkiasen/registry

On Popularity...

janhartigan opened this issue · 2 comments

I've been noticing that the "popularity" number fluctuates pretty wildly from time to time and I decided to take a look at exactly how you're calculating that. I found this method:

https://github.com/Anahkiasen/registry/blob/master/app/Registry/Services/IndexesComputer.php#L32-L52

This currently looks like this:

/**
 * Compute every package's popularity
 *
 * @return void
 */
public function computePopularity()
{
    $this->computeIndexes('popularity', array(
        'downloads_total' => 1.25,
        'watchers'        => 2,
        'forks'           => 0.75,
        'favorites'       => 0.25,
        'freshness'       => 0.75,
    ), array(
        'downloads_total' => 'downloads_total',
        'watchers'        => 'watchers',
        'forks'           => 'forks',
        'favorites'       => 'favorites',
        'freshness'       => 'freshness',
    ));
}

What strikes me as a bit odd is that you're weighting watchers at 2 but there doesn't seem to be anything checking the stars. IMO, stars are a better indicator of the popularity of a repo because they are explicitly for "favoriting" something without the hassle of getting constant updates about that repo.

In addition to that, I'm not sure I would even given the packagist "favorites" any weight since that site is kind of a disaster. Seems kind of like stars, total downloads, and forks should be the primary determinants of popularity. And then if you want to gauge "freshness", it should probably be based on recent downloads instead of recent repository updates (which is how I think you're doing it). "Freshness" as it pertains to popularity is probably more meaningful that way, especially considering you're already factoring that "freshness" into the "trust" metric.

Curious to hear what you think...

Watchers are stars – here "watchers" is an aggregated name for stars on Github and watchers on Bitbucket.
As for favorites on Packagist I do agree it's weight could be reduced.

The real reason you're seeing popularity fluctuate is because it's a percentage index. Meaning if a package has 98 it doesn't mean it's the sum of X plus Y, it just means that it's aggregated score is 98% of the most popular package. Since Jeffrey Way's generator were previously not tagged "laravel", when they suddenly were the whole scale was recomputed, but such events really rarely happen.

Re watchers: ah ha! The name was confusing me. Good to know.

I think it makes sense for you to compute it as an indexed score. Cool..thanks for the explanation.