Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments
Primary LanguagePython