Question about calculating R

Question

Question about calculating R

PDFangeltop1 opened this issue 8 years ago · 5 comments

Thank you very much for releasing the code , it helps me a lot.
I read carefully about the code , and have some questions in the function calc_reward.
1 in line 289 , you average the rewards in a minibatch, so for each sample in this batch, the reward will not be 0 or 1, but a value between 0 and 1. I am curious why do you this.
2 in line 291, you scale the shape of reward to (batchsize, 2*nGlimpses), why should not the shape of reward be (batchsize, nGlimpses), what is the motivation of multiplying a factor of 2?

Answer 1 · 2016-08-30T02:29:53.000Z

@PDFangeltop1 line289 reward just use to calc the accurary, and the real reward is R which be 1 or 0，and the line 291 mul 2 is because the location (x,y) is 2 digits.
i think this answer may help you

Answer 2 · 2016-08-30T04:59:41.000Z

@jlindsey15 I did not think i could get such a quick reply , thank you again. ^-^
For the reward R, i think i misunderstood accuracy and reward, and now i have no problem about that.
But for mul 2 issue, I am still confused about it. I notice you mul 2 not only to R, but also to baseline and p_loc. (in line 297). At every glimpse, we only calculate the R once for the reward of ONE action , which is a coordinate (2 digits), multiplying 2 makes me think that at every glimpse there are two actions, and two rewards, each action correspond to 1 digit of a coordinate....
And one more question in line 189 in the function get_next_input, I notice that you did not let the error of action network back propagate to the core net. But you let the baseline error back propagate to the core net. According to the paper, "Eq. (1) requires us to compute ∇θ log π(uit|si1:t; θ). But this is just the gradient of the RNN that defines our agent evaluated at time step t and can be computed by standard backpropagation [25]." it implies that the error is back propagated to the core net. Have you ever tried that ?
Sorry for taking up your precious time.

Answer 3 · 2016-08-30T06:48:33.000Z

@PDFangeltop1 Aha,i'm not the author, just the one watch the project like you.
the mul 2 question ,i think it is just a trick for easy calc. i don't now how to explain it.
and the backpropagation question , maybe you can see my issue opened ,i also have the question.
^_^

Answer 4 · 2016-08-30T07:35:29.000Z

@hhhmoan OK, I think i know why you mul 2. Sorry for bothering you with it.

Answer 5 · 2016-08-30T17:39:05.000Z

Hi! Sorry I haven't been very responsive -- I've been traveling and will be away for about two weeks, during which I have limited time and internet access. To clarify the multiplying by two part, it's because each glimpse contains an x and a y coordinate, so to represent one glimpse you need two "slots." As for the gradients issue, I think you both are right, but removing the stop_gradients(mean_loc) line breaks the code at the moment, suggesting that something else is wrong. I'll look into it when I can -- feel free to let us know if you figure out what's going on!