Could not find how to handle the intermediate rewards using rollout.

Question

Could not find how to handle the intermediate rewards using rollout.

Opened this issue 7 years ago · 0 comments

In the original paper, they used rollout to get the intermediate rewards during sequence generation, in this codes, it seems the generator only gets rewards when the whole sequence is generated. Could you explain which part of the code corresponds to the rollout ?