- Solve lunar lander problem from openai Gymnasium use Q-learning and experience replay memory [2].
- The implementation base on fakemonk1 [1] and references from juliankappler [3]
- Friendly and simple implementation with pytorch
- Run
python lunar_lander_v1.py
- Solve lunar lander problem from openai Gymnasium [2] use Q-learning.
- Periodly update q_target network parameter [4]
- Use softmax policy instead of epsilon greedy policy
- Multiple training step from replay memory
- [1] https://github.com/fakemonk1/Reinforcement-Learning-Lunar_Lander
- [2] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
- [3] https://github.com/juliankappler/lunar-lander