Environment used in the papers
@article{mazoure2019leveraging,
title={Leveraging exploration in off-policy algorithms via normalizing flows},
author={Mazoure, Bogdan and Doan, Thang and Durand, Audrey and Hjelm, R Devon and Pineau, Joelle},
journal={Proceedings of the 3rd Conference on Robot Learning (CoRL 2019)},
year={2019}
}
-
Episode max steps: 1000
-
The episode terminates if the agent falls down
-
Reward of +1 is granted if the agent's center of mass (COM) is above a threshold distance (wrt to origin) of 0.6.
import gym
env=gym.make("SparseHumanoid-v2")