reward modelling question
Closed this issue · 4 comments
backpropper commented
In the prosocial case, the reward printed should be rewards[0][2]
and not the one listed here.
In fact, what does taking the mean of all 3 rewards mean? For selfish, I believe there should be 2 rewards: one for each agent at the end of the episode.
hughperkins commented
It's been a while, but it looks like rewards is for a minibatch. Rewards[0]
is then the reward for the first pair of agents in the minibatch. (If we
print the entire mini batch, it is too spammy).
…On Mon, Dec 30, 2019, 02:28 Abhinav Gupta ***@***.***> wrote:
In the prosocial case, the reward printed should be rewards[0][2] and not
the one listed here
<https://github.com/asappresearch/emergent-comms-negotiation/blob/19ad405dcb83a3a521b6e1752cec075b69aa164b/ecn.py#L193>
.
In fact, what does taking the mean of all 3 rewards mean? For selfish, I
believe there should be 2 rewards: one for each agent at the end of the
episode.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AAA6FKA5ENO2672WR4KOBKTQ3GPKFA5CNFSM4KBJYFTKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IDJMUDA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA6FKCKNMRZOMM5HS5L7DTQ3GPKFANCNFSM4KBJYFTA>
.
backpropper commented
Yes rewards[0]
gives the rewards for the first episode in the batch but it has 3 entries: first 2 denote the selfish reward for each agent and the third one denotes the prosocial reward. So I'm saying taking the mean over all of them doesn't mean anything. Either it should print both the selfish rewards or only the prosocial one.
hpasapp commented
Oh... right. It's been a very long time. What you say sounds plausible.
Since the output is just for having some sort of visibility into what's
happening, I'm going to assume I was just lazily taking the mean in order
to print something indicative of what's happening. I wouldn't read too much
into it.
…On Mon, Dec 30, 2019 at 1:52 PM Abhinav Gupta ***@***.***> wrote:
Yes rewards[0] gives the rewards for the first episode in the batch but
it has 3 entries: first 2 denotes the selfish reward for each agent and the
third one denotes the prosocial reward. So I'm saying taking the mean over
all of them doesn't mean anything.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AEZ5OBDER3VIJ2WKL6JQXEDQ3I7PPA5CNFSM4KBJYFTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH26ITY#issuecomment-569762895>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEZ5OBESIED33KDAJUWX4X3Q3I7PPANCNFSM4KBJYFTA>
.
--
*HUGH PERKINS*
Research Engineer
<https://www.asapp.com/>
One World Trade Center
80th Floor
New York, NY 10007
backpropper commented
Thats good to know. Thanks!