araffin/rl-tutorial-jnrr19

Plotting timestep vs action function

NC25 opened this issue · 2 comments

NC25 commented

Hello,

I implemented and trained my PPO for a discrete action space.

env = gym.make('fishing-v0')
model = PPO2(MlpPolicy, env , verbose=2)
model.learn(total_timesteps=100

Now, I am trying to plot a graph that shows timesteps v action so I can see what my model performs

def step():
  obs = env.reset()
  for i in range(100):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
    y = []
    y.extend(action)
    return y
    
step()

x = np.linspace(0, 100, 100)
fig, ax = plt.subplots()  # Create a figure and an axes.
ax.plot(x, y, label='linear') 

But I get an error that the object is not iterable
I was trying to get y to be a list with 100 reward values so the function can be iterable, but the error shows that it is not the case

I wondering if there was another way so I could plot timestep vs action.

Hi,

The list y is being recreated inside the for loop, you should create it once just after obs = env.reset()

However I would suggest evaluating your model on several instances of the problem and taking the average using:
from stable_baselines.common.evaluation import evaluate_policy

NC25 commented

@edbeeching

Thank you, I was able to implement


def step():
 obs = env.reset()
 y = []
 for i in range(100):
   action, _states = model.predict(obs)
   obs, rewards, dones, info = env.step(action)
   env.render()
   y.append(action)
 return y
   
step()

x = np.linspace(0, 100, 100)
fig, ax = plt.subplots()  # Create a figure and an axes.
ax.plot(x, step(), label='linear')