How to do evaluation for example on PPO

Question

How to do evaluation for example on PPO

Closed this issue 8 months ago · 3 comments

Problem Description

I use my customized env for training with PPO_continuous_action.py and I save the state_dict of Agent every save_freq number_update. However, when I load the model and state_dict afterhead, I found the performance (reward) is far worse than train, even like random action.
could you please provide an example for evaluation. I'm not sure whether it's the reason of env.wrapper

Answer 1 · 2023-06-17T15:33:11.000Z

See #310 (comment)

Answer 2 · 2023-06-17T15:33:32.000Z

您好，感谢您的来信。已确认收到您的邮件，我会尽快处理您的邮件，谢谢。Hello！This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you！仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you

Answer 3 · 2023-10-07T20:21:03.000Z

您好，感谢您的来信。已确认收到您的邮件，我会尽快处理您的邮件，谢谢。Hello！This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you！仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you