Running the code
Opened this issue · 4 comments
Hi,
This is a meaningful work to combine imitation learning and reinforcement learning in a hierarchical architecture to solve Montezuma’s Revenge.
I successfully run the code to train hybrid_rl_il_agent. When I test the well-trained model, I find the agent makes the same actions every episode. It seems that the agent follows a completely fixed trajectory to play the game without some adaptation. Is this a good strategy for the agent?
And then I want to train the h-DQN agent as a comparison, but I cannot find the right code to do this. Can you give me some advice to start the training?
Thanks.
hi there, regarding fixed trajectory: this is due to the arcade learning environment (ALE) largely being deterministic, and the subgoal policies learned are also deterministic (it is a variant of double deep Q learning for each subgoal). Doesn't mean that it is a bad strategy. Of course you could swap it with some other stochastic policies for the lower-level policies.
Regarding h-DQN baseline comparison: Let me clean up my baseline code and I will put them up as well. The summary is that it mostly doesn't learn anything useful for games like Montezuma's Revenge.
@nanxintin Did you use python 2 to run the code?
Yes I did use python 2.7 to run the code back then