
A naive question about updating parameters in DDPG.

HiddenBeginner opened this issue · 0 comments

Hi, first of all, thanks for your awesome codes. This is not about any technical issue, but about the algorithm of the DDPG code.

As far as I know, the DDPG method can exploit online parameter update due to the TD learning. But, in your code, the parameters are updated after an episode is over.

I would like to ask you if there are some theoretical background behind this parameter update interval?

Thank you in advance.