/gym-mellowmax-mdp

This is the implementation of the 2-state MDP that is used in the mellowmax paper to show that softmax is not a non-expansion.

Primary LanguagePythonMIT LicenseMIT

Stargazers