philihp/openskill.js

Rating decay to eliminate camping

Closed this issue · 8 comments

Is there any provision for rating decay? Fundamentally, the big problem with original TrueSkill is that someone can have a high ranking, end up on leaderboards, and then stop playing. This is called "camping", and is a common behavior pattern. People get a good game streak, their rank gets high, and then they are disincentivized to play ranked modes again.

I've been playing around with a function that increases the sigma value of players that haven't played a minimum of games in a recent time window, but this comes with side effects. Notably, if a player's sigma is increased through rating decay and then they win their next game, their mu value ends up higher than it would have been absent rating decay.

Do you have any thoughts on how to address rating decay? I would like some sort of function that decays the ranking of players that don't play actively.

I would assume one solution is to increase sigma values on a linear scale or some other curve. But I don't know how this could be implemented such that it's agnostic of a database.

I would assume one solution is to increase sigma values on a linear scale or some other curve. But I don't know how this could be implemented such that it's agnostic of a database.

My first simple try was to increase sigma by multiplying by some factor (such as 1.05) for each day of no play. But it's immediately apparent that there is a side effect:

If we have a match between two players of equal mu and sigma, we get the following:

Before
trueskill.Rating(mu=25.000, sigma=8.333)
trueskill.Rating(mu=25.000, sigma=8.333)

After
trueskill.Rating(mu=29.396, sigma=7.171)
trueskill.Rating(mu=20.604, sigma=7.171)

Now, let's look at the same case where player 1 has not played in a long time. Their sigma has been multiplied by a factor of 1.5

Before
trueskill.Rating(mu=25.000, sigma=11.667)
trueskill.Rating(mu=25.000, sigma=8.333)

After
trueskill.Rating(mu=32.275, sigma=9.286)
trueskill.Rating(mu=21.288, sigma=7.514)

After winning their game, the first player now has a mu of 32.275, about three points higher than with no rating decay. Intuitively it seems like the player is being rewarded for having not played in a long time. They do not play for a long time, have a penalty by having increased uncertainty, but after winning one game their mu is now higher than if they had been playing actively.

It would be very interested in hearing what the authors think of this scenario. Is this ok? Or is it an undesirable side effect and we should look at other options for rating decay?

How about only showing people on your leaderboards if they've played a game this week/month/season? You could show them on a different "hall of fame" board, and they'd pop back up to the current boards if they come back from vacation.

I think this is a very interesting question though. I too am interested to know what others think, because I think it really depends on your players. It wouldn't help too much for predictive capability because we don't have any idea how much time is going on between each match anyway.

I think I've implemented a rudimentary version of this assuming games where you get worse the less often you play. It's something I came up without running against real world data so please be a little skeptical if this will work.

from openskill import Rating, rate, predict_win

x, y = Rating(), Rating()

mu_precision = 1 + 0.0001/365
sigma_precision = 1 + 1 / 365

# Let player X win 66% of the games.
for match in range(10000):
    if match % 3:
        [[x], [y]] = rate([[x], [y]])
    else:
        [[y], [x]] = rate([[y], [x]])

    # Decay Rating - Assume 1 Match Per Day
    x.mu /= mu_precision
    y.mu /= mu_precision

    x.sigma *= sigma_precision
    y.sigma *= sigma_precision

print("Before Large Decay: ")
print(f"Player X: mu={x.mu}, sigma={x.sigma}")
print(f"Player Y: mu={y.mu}, sigma={y.sigma}\n")

print("Predict Winner Before Decay:")
x_percent, y_percent = predict_win([[x], [y]])
print(f"X has a {x_percent * 100: 0.2f}% chance of winning over Y\n")

# Decay Rating - Assume 365 Days Passed
for match in range(365):

    # Only player X's rating has decayed.
    if (x.mu < 25 + 3 * 25/3) or (x.mu > 25 - 3 * 25/3):
        x.mu /= mu_precision

    if x.sigma < 25 / 3:
        x.sigma *= sigma_precision

print("Player X's Rating After Decay: ")
print(f"Player X: mu={x.mu}, sigma={x.sigma}\n")

# One Match b/w X and Y
[[x], [y]] = rate([[x], [y]])
x.mu /= mu_precision
y.mu /= mu_precision
x.sigma *= sigma_precision
y.sigma *= sigma_precision


print("After Large Decay (1 Year): ")
print(f"Player X: mu={x.mu}, sigma={x.sigma}")
print(f"Player Y: mu={y.mu}, sigma={y.sigma}\n")

print("Predict Winner After Decay:")
x_percent, y_percent = predict_win([[x], [y]])
print(f"X has a {x_percent * 100: 0.2f}% chance of winning over Y")

Source - Time Decay

Before Large Decay: 
Player X: mu=26.986479759996925, sigma=1.879261533806081
Player Y: mu=22.87672143851661, sigma=1.879261533806081

Predict Winner Before Decay:
X has a  70.27% chance of winning over Y

Player X's Rating After Decay: 
Player X: mu=26.983781247317594, sigma=5.101382249884723

After Large Decay (1 Year): 
Player X: mu=28.199913286886318, sigma=4.958583411621401
Player Y: mu=22.711677880164803, sigma=1.881565104224607

Predict Winner After Decay:
X has a  58.51% chance of winning over Y

How about only showing people on your leaderboards if they've played a game this week/month/season? You could show them on a different "hall of fame" board, and they'd pop back up to the current boards if they come back from vacation.

Yes, that is an option. But fundamentally, as long as an undecayed rating is displayed somewhere, then there is an incentive to camp that rating. And so I think it doesn't really solve the problem, which is that some people are going to get a higher ranking than they think they can reasonably maintain, and then stop playing ranked (or at least have an incentive to not want to).

I think this is a very interesting question though. I too am interested to know what others think, because I think it really depends on your players. It wouldn't help too much for predictive capability because we don't have any idea how much time is going on between each match anyway.

We could log the time of each match in UTC, and then apply a custom decay function in match interval. A function could be more complex than just a match time decay multiplier. For example, we could set a threshold of a minimum number of matches played in a certain time interval to avoid any decay (such as 5 games played in the last 7 day interval), and then apply a decay function if that threshold is not met.

If you've got the last time they played, apply your decay function as either part of ordinal when the rating is displayed, or maybe periodically by some cron job externally to when they play matches.

But fundamentally, as long as an undecayed rating is displayed somewhere, then there is an incentive to camp that rating.

Yeah then don't have a hall of fame at all, and just hide retired players from your leaderboards. I think a lot depends on your players, and what they want TBH. How much of the metagame do you want to be about ratings?

If you've got the last time they played, apply your decay function as either part of ordinal when the rating is displayed, or maybe periodically by some cron job externally to when they play matches.

Yes this was my thinking. Either apply the function on rating display, meaning it would only be a "displayed" decay and not actually affect the backend ranking, or else actually modify mu and sigma with the decay function right before a match is played.

But fundamentally, as long as an undecayed rating is displayed somewhere, then there is an incentive to camp that rating.

Yeah then don't have a hall of fame at all, and just hide retired players from your leaderboards. I think a lot depends on your players, and what they want TBH. How much of the metagame do you want to be about ratings?

In this particular case, people really care about ratings. Discussions of who are the best players have been doing on for decades, and having a robust ranking system is the primary motive force for this implementation.

Cool, I hope that answers the question (to you, and to anyone coming across this). I believe the guidance that we've arrived at is:

If you want to have your scores decay, don't modify the backend rating. Rather, apply it when the rating is displayed, as some function of how often they play or how often since they've played.