cblgh/lieu

Potential sadness if you find your domain in "boringDomains" config

decentral1se opened this issue · 6 comments

👋 r a d project 🥳

As an admin, it feels a bit uncomfortable putting domains into that config with such a name. The functionality is great & relevant but in a pubnix / shared server environment, other users might get the wrong idea seeing the naming? If I'm trying to instead focus my search space on relevant links and not saying I think their stuff is "boring".

Proposal: skipDomains.

cblgh commented

I was about to start implementing this, but: skipDomains is a bit generic and if I read that I'd assume it has something to do with what the crawler actually crawls.

For finding an alternative, there are a few reasons one would put a domain in this file:

  • The domain is linked way too often but doesn't make interesting results (i.e. creativecommons.org)
  • The domain belongs to a big service that (again) doesn't make for very interesting results as lieu is mostly intended for finding smaller sites.
  • The domain belongs to a service the operator of the lieu instance doesn't want to support for one reason or another.

The first two probably being the reason its current name is boringDomains (as in: doesn't make interesting search results)

Some alternatives I considered are:

  • Something along the lines of tooCommonDomains
  • excludedDomains (but this one neither fixes the "potential sadness" nor the "too generic" problem)

Another thing I considered is making it possible to add a comment on why something made it into the "boring" category.

Usually explaining why something happened is better than trying to rename the mechanism.

Yeh perhaps a bit of a bikeshed on the rename but I'd stil suggest to rename (whatever is good imo, for reasons above) + following your line of thought @slatian add some further documentation of the reasons for using this option. The name will probably always be a bit generic but the documentation can be very specific. That seems like a good compromise and will probably be a overall bonus for maintainers.

I was about to start implementing this, but: skipDomains is a bit generic and if I read that I'd assume it has something to do with what the crawler actually crawls.

For finding an alternative, there are a few reasons one would put a domain in this file:

  • The domain is linked way too often but doesn't make interesting results (i.e. creativecommons.org)
  • The domain belongs to a big service that (again) doesn't make for very interesting results as lieu is mostly intended for finding smaller sites.
  • The domain belongs to a service the operator of the lieu instance doesn't want to support for one reason or another.

The first two probably being the reason its current name is boringDomains (as in: doesn't make interesting search results)

Bravo!

  • boringDomains actually leads one to the correct conclusion; to presume that it's exactly what it is, because they're neither too common nor are they excluded (otherwise, they wouldn't even be acknowledged at all); they're of particularly low value, i.e., (boring). Pretty much the same functionality going way back to Vestris Alkaline and mnoGoSearch.
  • boringDomains is highly, accurately descriptive of the content resulting in abysmal SERPS - Visiting any 'pubnix' (SDF, one of the Tilde's, Etc.) will immediately yield a plethora of truly boring gopherholes, boring personal web pages, boring phlogs, and boring Gemini so-called capsules. There's no offense taken by calling a spade a spade, authors know whether their original content is of particular quality, or plain garbage, and IMO, boringDomains is apropos, litterally and comparitively to 'garbageDomains'.
  • Just my observations - but calling something common when it is crap, begs of someone to misconstrue and IMO, offensive to group it in with the commonality of indexed resources, that are of high quality.
  • boringDomains is merely an accordingly descriptive moniker directly relating to the mediocrity of the resource's respective content.

'pubnix' […] truly boring gopherholes, boring personal web pages, boring phlogs, and boring Gemini so-called capsules

That doesn't sound very respectful to me …

The intention behind boringDomains was to exclude sites that are too big and linked too often, but pubnixes are many sites that happen to share the same domain and usually are not linked in excessive amount. (a bit like github, sourcehut or codeberg pages to name a few). I'm also in favor of documenting this.

cblgh commented

@slatian yes to documenting this more clearly / updating the docs

when @decentral1se i thought they were referring to the file bannedDomains! boringDomains was indeed intended for filtering out boring stuff like facebook, instagram links etc etc :)