The course website: http://rail.eecs.berkeley.edu/deeprlcourse/
My own solutions for Cs294-112
Behavioral Cloning vs DAgger
I was able to get the results below with given hyperparameter.
Learning Curves
Hopper-v2
Reacher-v2
Agents with huge improvements in DAgger have shown soaring loss function in learning curves.
Policy Gradient Method in discrete action space and continous action space
- Reward-to-go has shown improvements in performance
- Normalizing the advantageous function has shown reduction of the high variance
- Providing baseline has shown reduction of the high variance