ogmacorp/AOgmaNeo

code questions

iacore opened this issue · 5 comments

iacore commented

Here, t is never history_size, and t2 starts at t-1. As a result, history_samples[history_size-1] is never touched here. Is this intentional?

int t = rand() % (history_size - params.min_steps) + params.min_steps;
// compute (partial) values, rest is completed in the kernel
float r = 0.0f;
float d = 1.0f;
for (int t2 = t - 1; t2 >= 0; t2--) {
r += history_samples[t2].reward * d;
d *= params.discount;
}

222464 commented

Hi, yes, this is intentional, since the reward for sample t actually occurs at t - 1 (an action happens, then a reward is received for that action on the next step).

Hopefully this makes sense!

iacore commented

Thanks for answering!

Another question: the code here looks like it should be sum /= max(1, count) * 255;

sum /= max(1, count * 255);

222464 commented

That's just to avoid a divide by zero in rare cases, the two statements are equivalent since count is an integer.

iacore commented

when the count is zero they are not the same.

222464 commented

True, but in the 0 case no value is valid anyway - also, in those rare cases (only when the hierarchy is strangely configured), sum will also be zero, so it will be 0 / something.