TimZaman/dotaclient

Calculating Reward on Game End

Closed this issue · 5 comments

If you look at the stream of rewards below (entire Game #2) you will see that it ends in victory for Dire, however you only see 1 death each from both agents. Also, based on tower_hp it looks like the tower was not even close to dying, meaning the game ended b/c the Radiant agent died a 2nd time, but I don't have the -3.0 kill reward for Player 0 in the reward a second time.

This make me believe that we don't capture the rewards between the last reward sync and the game end.

2019-02-05 10:59:50,789 INFO     === Starting Game 2.
2019-02-05 10:59:50,789 INFO     Starting game.
2019-02-05 10:59:50,797 INFO     Player 0 using weights version 0
2019-02-05 10:59:50,802 INFO     Player 5 using weights version 0
2019-02-05 11:00:16,411 INFO     Player 0 rollout.
2019-02-05 11:00:16,412 INFO     Player 0 reward sum: -0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.114,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-02-05 11:00:16,429 INFO     Player 5 rollout.
2019-02-05 11:00:16,430 INFO     Player 5 reward sum: 0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.0,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.114}
2019-02-05 11:00:33,551 INFO     Received new model: version=0, size=1472372b
2019-02-05 11:00:40,146 INFO     Player 0 rollout.
2019-02-05 11:00:40,147 INFO     Player 0 reward sum: -0.15 subrewards:
{'death': -3.0,
 'denies': 0.0,
 'enemy': 3.0988716954415696,
 'hp': -1.2002411301619431,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.015,
 'win': 0.0,
 'xp': 0.9700000000000001}
2019-02-05 11:00:40,158 INFO     Player 5 rollout.
2019-02-05 11:00:40,159 INFO     Player 5 reward sum: 0.15 subrewards:
{'death': -3.0,
 'denies': 0.2,
 'enemy': 3.245241130161943,
 'hp': -1.2005383621082364,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.058333333333333334,
 'win': 0.0,
 'xp': 0.96}
2019-02-05 11:00:56,220 INFO     Player 0 rollout.
2019-02-05 11:00:56,221 INFO     Player 0 reward sum: -6.98 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.61683011154303,
 'hp': -1.3583716176202625,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': -5.0,
 'xp': 0.0}
2019-02-05 11:00:56,226 INFO     Player 5 rollout.
2019-02-05 11:00:56,227 INFO     Player 5 reward sum: 6.72 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': 1.3602503028054476,
 'hp': -0.3734285740740741,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.018333333333333333,
 'win': 5.0,
 'xp': 0.756}
2019-02-05 11:00:56,232 INFO     Game finished.

Both players died here, during the same rollout. Then one player won at the last rollout. The rewards are aggregated only per-rollout.
It doesn't matter how he won (death or tower), a win might be because of a death or tower, but that's not scored independently - no need to.

But a single player needs to die twice for game to be over. They each died once according to record, so game should not be over. What I believe happened is that a player died a 2nd time and that this info was not captured in our reward aggregation.

but doesn't the bot that died a 2nd time not get the negative reward from the 2nd death? Sure, it "loses" and gets the -5 but it won't necessarily make the connection that 2nd death is the cause as it doesn't see the 2nd death in the rewards.

or am I misunderstanding something about our algo?