emasquil/ppo

Proximal Policy Optimization Algorithm implementation for the Deep Reinforcement Learning course @ MVA

PythonMIT

Issues

End of rollout: Handle case where the episodes finishes when the rollout is not done yet
#37 opened 2 years ago by theovincent
1
Clip value loss
#49 opened 2 years ago by theovincent
1
Do the policy network like in cleanrl
#48 opened 2 years ago by theovincent
1
Define epsilon to 1e-5 instead of 1e-8
#50 opened 2 years ago by theovincent
0
Plots like in the paper:
#38 opened 2 years ago by theovincent
0
remove observation_tp1
#45 opened 2 years ago by theovincent
0
Annealing learning rate
#39 opened 2 years ago by theovincent
0
Regularization i.e. Trust Region part (KL) and early stopping
#21 opened 2 years ago by paulinesert
0
being able to compare several runs
#41 opened 2 years ago by theovincent
1
Target & Value network: The ones of the network.
#36 opened 2 years ago by theovincent
0
From single obs to batch
#27 opened 2 years ago by theovincent
0
Clipping loss
#22 opened 2 years ago by paulinesert
0
Compute GAE (advantage estimation)
#26 opened 2 years ago by theovincent
1
PPO interaction loop
#25 opened 2 years ago by theovincent
0
Replay Buffer
#23 opened 2 years ago by theovincent
0
Review popular PPO implementations (acme, stable-baselines, spinning-up RL, etc) and decide the architecture
#3 opened 2 years ago by paulinesert
0
Create an agent function
#5 opened 2 years ago by nviolante25
0
[BUG] Environment is not restarting properly
#10 opened 2 years ago by emasquil
0
Code ppo agent
#14 opened 2 years ago by theovincent
0
Delete branch agent
#13 opened 2 years ago by theovincent
0
Get familiar with the two environments: reacher and inverted pendulum. Summarize them
#2 opened 2 years ago by theovincent
0
Setup environment so that everyone use the same environment.
#7 opened 2 years ago by theovincent
2
Create an environment function
#1 opened 2 years ago by theovincent
0