Actor model finetuning code based on reward and policy gradient

Question

Actor model finetuning code based on reward and policy gradient

parshinsh opened this issue 2 years ago · 1 comments

Thanks for the great work! Is it possible that you can share the code of whole RL framework finetuning (Actor & Critic updates based on the reward defined in the paper) for better reproducibility? For example, the code of updating Actor network based on reward and policy gradient is missing.

Answer 1 · 2022-11-18T16:20:59.000Z

@parshinsh we updated the code for finetuning the actor model with synthetic samples and their return estimates.

Thank you for your patience!