tgxn/lemmy-explorer

Sorting/Scoring System For Instances

Closed this issue · 2 comments

tgxn commented

Discussed in https://github.com/tgxn/lemmy-explorer/discussions/23

Originally posted by tgxn June 14, 2023
Because we need to determine if an instance is "good" there needs to be a way to score each instance based on data we have about it.

Currently, my thinking/implementation looks at the lists of federated sites, and scores each instance based on the amount of other instances that refer to it (in the linked, allowed and blocked lists).

Scoring is applied by the following rules:

Instances

      let score = 0;
      if (linkedFederation[siteBaseUrl]) {
        score += linkedFederation[siteBaseUrl];
      }
      if (allowedFederation[siteBaseUrl]) {
        score += allowedFederation[siteBaseUrl] * 2;
      }
      if (blockedFederation[siteBaseUrl]) {
        score -= blockedFederation[siteBaseUrl] * 10;
      }

Communities

Uses the same base score as instances, and then adjusts based on a posts per subscriber metric.

      let score = 0;
      if (linkedFederation[siteBaseUrl]) {
        score += linkedFederation[siteBaseUrl];
      }
      if (allowedFederation[siteBaseUrl]) {
        score += allowedFederation[siteBaseUrl] * 2;
      }
      if (blockedFederation[siteBaseUrl]) {
        score -= blockedFederation[siteBaseUrl] * 10;
      }

      // also score based subscribers
      score = score * community.counts.subscribers;

These rules are obviously not ideal, as I'd need to run some more analysis to determine if they are tuned correctly.

I'm also thinking that it might be worthwhile to log an "uptime" or "first seen" score also to determine if it's been around/up for a while.

I think active users per week would be the best default sorting. This avoids over-emphasizing communities that might be older but not as active.

tgxn commented

Upgraded scoring is deployed, also added additional sorting options
#143