erfanMhi/gym-mellowmax-mdp
This is the implementation of the 2-state MDP that is used in the mellowmax paper to show that softmax is not a non-expansion.
PythonMIT
This is the implementation of the 2-state MDP that is used in the mellowmax paper to show that softmax is not a non-expansion.
PythonMIT