Ask for help regarding the value of reward
Narnarn opened this issue · 1 comments
Frist of all, thank you for the great work you've done! The code of this reproduction is very clear.
Here's my problem.
I add a best_model_save_path
parameter to the EvalCallback
call in the script.py
, so I can get the best model after the training steps. But after I try to evaluate the model using evaluate_policy
from stable_baselines3.common.evaluation
, I got really confused. The reward I got from this evaluation is negative, which is far away from the episode_reward
in the logs, it's even worse than the fisrt eval result during the training. Why is it the case? I looked at the docs of stable-baselines3, the EvalCallback also uses evaluate_policy
to get the reward values, so the results should be close.
In my test code, I just load the env like the script.py
, and here's my evaluation process.
Agent = getattr(stable_baselines3, args.agent)
model = Agent.load("./testing/evaluation/model/best_model")
print(evaluate_policy(model, env))
Actually, I found this bcs I tried to tune the hyperparameters using Optuna, but the value given by Optuna is negative, while the episode reward
is positive and pretty large. I really get confused by this result.
Thanks again!
I find it may be caused by my modification to the code...As it's alright if I test it on the original code from the repo. Sorry to bother!