/MPO

Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments

Primary LanguagePython

Issues