Running the code

Question

Running the code

Opened this issue 7 years ago · 4 comments

Hi,
This is a meaningful work to combine imitation learning and reinforcement learning in a hierarchical architecture to solve Montezuma’s Revenge.
I successfully run the code to train hybrid_rl_il_agent. When I test the well-trained model, I find the agent makes the same actions every episode. It seems that the agent follows a completely fixed trajectory to play the game without some adaptation. Is this a good strategy for the agent?
And then I want to train the h-DQN agent as a comparison, but I cannot find the right code to do this. Can you give me some advice to start the training?
Thanks.

Answer 1 · 2018-04-25T22:42:17.000Z

hi there, regarding fixed trajectory: this is due to the arcade learning environment (ALE) largely being deterministic, and the subgoal policies learned are also deterministic (it is a variant of double deep Q learning for each subgoal). Doesn't mean that it is a bad strategy. Of course you could swap it with some other stochastic policies for the lower-level policies.

Regarding h-DQN baseline comparison: Let me clean up my baseline code and I will put them up as well. The summary is that it mostly doesn't learn anything useful for games like Montezuma's Revenge.

Answer 2 · 2020-01-26T20:41:33.000Z

@nanxintin Did you use python 2 to run the code?

Answer 3 · 2020-01-29T14:00:16.000Z

@moonsh I'm sorry that I can not remember yet.

Answer 4 · 2020-01-29T16:41:16.000Z

Yes I did use python 2.7 to run the code back then