salesforce/CodeRL

Actor model finetuning code based on reward and policy gradient

parshinsh opened this issue · 1 comments

Thanks for the great work! Is it possible that you can share the code of whole RL framework finetuning (Actor & Critic updates based on the reward defined in the paper) for better reproducibility? For example, the code of updating Actor network based on reward and policy gradient is missing.

@parshinsh we updated the code for finetuning the actor model with synthetic samples and their return estimates.

Thank you for your patience!