Plotting timestep vs action function
NC25 opened this issue · 2 comments
Hello,
I implemented and trained my PPO for a discrete action space.
env = gym.make('fishing-v0')
model = PPO2(MlpPolicy, env , verbose=2)
model.learn(total_timesteps=100
Now, I am trying to plot a graph that shows timesteps v action so I can see what my model performs
def step():
obs = env.reset()
for i in range(100):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
y = []
y.extend(action)
return y
step()
x = np.linspace(0, 100, 100)
fig, ax = plt.subplots() # Create a figure and an axes.
ax.plot(x, y, label='linear')
But I get an error that the object is not iterable
I was trying to get y to be a list with 100 reward values so the function can be iterable, but the error shows that it is not the case
I wondering if there was another way so I could plot timestep vs action.
Hi,
The list y is being recreated inside the for loop, you should create it once just after obs = env.reset()
However I would suggest evaluating your model on several instances of the problem and taking the average using:
from stable_baselines.common.evaluation import evaluate_policy
Thank you, I was able to implement
def step():
obs = env.reset()
y = []
for i in range(100):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
y.append(action)
return y
step()
x = np.linspace(0, 100, 100)
fig, ax = plt.subplots() # Create a figure and an axes.
ax.plot(x, step(), label='linear')