Chalarangelo/30-seconds-web

Update ranker

Chalarangelo opened this issue · 0 comments

As of right now, our ranking system is problematic especially in regards to how freshness impacts rankings. More specifically, our current formula is as follows:

r(f, l) = (14 + f) / (14 + f * f) * l

where f the number of days since first seen and l the percentage of points awarded for freshness (currently 40%). This formula should produce a curve that falls sharply around the 14 day mark. Breakdown of some values:

f r(l) r
1 1 * l 40%
2 0.89 * l 35.6%
3 0.75 * l 30%
5 0.41 * l 16.4%
7 0.33 * l 13.2%
10 0.21 * l 8.4%
14 0.13 * l 5.2%
21 0.08 * l 3.2%
30 0.05 * l 2%

While the formula is not perfect, it's good enough for most of our use-cases. However, when l is plugged into the formula, the picture is clear: New content will always rise to the top and will be extremely hard to fight, due to keyword score being 60% of the total.

The needs the original ranking system tried to cover might not apply 100% at this time. New content should rise to the top, but not as much, especially when articles are presented in their own section and can be sorted by new. Additionally, the frequency of blog posts must be taken into account (roughly every 3-4 days).

With all that taken into account, a new ranking formula would be as follows:

  • Lower l to 20% instead of 40%
  • Simplify the formula to use a lookup table and actually cut off at 14 days:
f r(l) r
1 1 * l 20%
2 0.85 * l 17%
3-4 0.7 * l 14%
5-6 0.45 * l 9%
7-10 0.25 * l 5%
11-14 0.1 * l 2%
15+ 0 * l 0%

What this will achieve is a smaller boost overall and a sharper decrease after the fourth day, while after two weeks of being uploaded content will no longer be considered fresh at all.