Issues
- 1
End of rollout: Handle case where the episodes finishes when the rollout is not done yet
#37 opened by theovincent - 1
Clip value loss
#49 opened by theovincent - 1
Do the policy network like in cleanrl
#48 opened by theovincent - 0
Define epsilon to 1e-5 instead of 1e-8
#50 opened by theovincent - 0
Plots like in the paper:
#38 opened by theovincent - 0
remove observation_tp1
#45 opened by theovincent - 0
Annealing learning rate
#39 opened by theovincent - 0
- 1
being able to compare several runs
#41 opened by theovincent - 0
- 0
From single obs to batch
#27 opened by theovincent - 0
Clipping loss
#22 opened by paulinesert - 1
Compute GAE (advantage estimation)
#26 opened by theovincent - 0
PPO interaction loop
#25 opened by theovincent - 0
Replay Buffer
#23 opened by theovincent - 0
Review popular PPO implementations (acme, stable-baselines, spinning-up RL, etc) and decide the architecture
#3 opened by paulinesert - 0
Create an agent function
#5 opened by nviolante25 - 0
[BUG] Environment is not restarting properly
#10 opened by emasquil - 0
Code ppo agent
#14 opened by theovincent - 0
Delete branch agent
#13 opened by theovincent - 0
Get familiar with the two environments: reacher and inverted pendulum. Summarize them
#2 opened by theovincent - 2
- 0
Create an environment function
#1 opened by theovincent