reward modelling question

Question

reward modelling question

Closed this issue 5 years ago · 4 comments

In the prosocial case, the reward printed should be rewards[0][2] and not the one listed here.

In fact, what does taking the mean of all 3 rewards mean? For selfish, I believe there should be 2 rewards: one for each agent at the end of the episode.

Answer 1 · 2019-12-30T11:57:54.000Z

It's been a while, but it looks like rewards is for a minibatch. Rewards[0] is then the reward for the first pair of agents in the minibatch. (If we print the entire mini batch, it is too spammy).

…

On Mon, Dec 30, 2019, 02:28 Abhinav Gupta ***@***.***> wrote: In the prosocial case, the reward printed should be rewards[0][2] and not the one listed here <https://github.com/asappresearch/emergent-comms-negotiation/blob/19ad405dcb83a3a521b6e1752cec075b69aa164b/ecn.py#L193> . In fact, what does taking the mean of all 3 rewards mean? For selfish, I believe there should be 2 rewards: one for each agent at the end of the episode. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AAA6FKA5ENO2672WR4KOBKTQ3GPKFA5CNFSM4KBJYFTKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IDJMUDA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA6FKCKNMRZOMM5HS5L7DTQ3GPKFANCNFSM4KBJYFTA> .

Answer 2 · 2019-12-30T18:52:39.000Z

Yes rewards[0] gives the rewards for the first episode in the batch but it has 3 entries: first 2 denote the selfish reward for each agent and the third one denotes the prosocial reward. So I'm saying taking the mean over all of them doesn't mean anything. Either it should print both the selfish rewards or only the prosocial one.

Answer 3 · 2019-12-30T18:57:05.000Z

Oh... right. It's been a very long time. What you say sounds plausible. Since the output is just for having some sort of visibility into what's happening, I'm going to assume I was just lazily taking the mean in order to print something indicative of what's happening. I wouldn't read too much into it.

…

On Mon, Dec 30, 2019 at 1:52 PM Abhinav Gupta ***@***.***> wrote: Yes rewards[0] gives the rewards for the first episode in the batch but it has 3 entries: first 2 denotes the selfish reward for each agent and the third one denotes the prosocial reward. So I'm saying taking the mean over all of them doesn't mean anything. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AEZ5OBDER3VIJ2WKL6JQXEDQ3I7PPA5CNFSM4KBJYFTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH26ITY#issuecomment-569762895>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEZ5OBESIED33KDAJUWX4X3Q3I7PPANCNFSM4KBJYFTA> .

-- *HUGH PERKINS* Research Engineer <https://www.asapp.com/> One World Trade Center 80th Floor New York, NY 10007

Answer 4 · 2019-12-30T19:18:52.000Z

Thats good to know. Thanks!