acyclics/MPO

Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments

Python

Issues

My 'loss_l' goes to 1.837 and the model never improves
#2 opened 2 years ago by MotorCityCobra
0
q_ret update not used
#1 opened 3 years ago by mvindiola1
0