vwxyzjn/cleanrl

How to do evaluation for example on PPO

Closed this issue · 3 comments

Problem Description

I use my customized env for training with PPO_continuous_action.py and I save the state_dict of Agent every save_freq number_update. However, when I load the model and state_dict afterhead, I found the performance (reward) is far worse than train, even like random action.
could you please provide an example for evaluation. I'm not sure whether it's the reason of env.wrapper