andrewjong/Deep-Learning-Paper-Surveys

[PPO] Proximal Policy Optimization Algorithms (Month Year Conference)

Opened this issue · 1 comments

0. Article Information and Links

1. What do the authors try to accomplish?

Vanilla Policy Gradients are unstable.
TRPO is complicated.
Make something stable and simple

2. What's great compared to previous research?

3. Where are the key elements of the technology and method?

4. How do the authors measure success?

5. How did you verify that it works?

6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)

7. Are there any papers to read next?

8. References

We covered this paper in this video