solving POMDP by LSTM in gym.cartpole environment, in pytorch
- tensorflow (for tensorboard logging)
- pytorch (>=1.0, 1.0.1 used in my experiment)
- gym
the idea of convert Cartpole-v0 into a POMDP task comes from HaiyinPiao
and the full observation of cartpole in gym is in 4 dimensions :
- cart position (-4.8, 4.8)
- cart velocity (-inf, inf)
- pole angle (-24°, 24°)
- pole velocity at tip (-inf, inf)
and we can delete one or more dimensions of the standard states and make the task become a partial observed markov decision process (POMDP).
LSTM | no LSTM |
---|---|
LSTM | no LSTM |
---|---|
When the partial observability becomes more severe, LSTM would significantly improving the performance of RL agent.