How to do evaluation for example on PPO
Closed this issue · 3 comments
qiuruiyu commented
Problem Description
I use my customized env for training with PPO_continuous_action.py and I save the state_dict of Agent every save_freq number_update. However, when I load the model and state_dict afterhead, I found the performance (reward) is far worse than train, even like random action.
could you please provide an example for evaluation. I'm not sure whether it's the reason of env.wrapper
vwxyzjn commented
See #310 (comment)
qiuruiyu commented
您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you! 仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you
qiuruiyu commented
您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you! 仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you