HRL principle to combine PPO + DDPG via KL-div, one shared critic, control vs high level policy with blinding control policy principle
Primary LanguagePythonMIT LicenseMIT
No one’s star this repository yet.