Reinforcement Learning
Overview
This is an repository of reinforcement learning that I'm currently working on.
Result
- DQN in Lunar Lander discrete action space
- Dueling architecture Double Q Learning with prioritized experience replay after training 4,500 episodes.
- Proximal policy optimization in Atari Pong
- Lunar lander continuous environment by using Deep Deterministic Policy Gradient. (Fail)
- Deep Deterministic Policy Gradient (DDPG) in Pendulum environment and moving average reward
- Asynchronous Advantage Actor Critic in CartPole.
- Actor Critic in Mountain Car
Algorithm
- Deep Q Network (DQN)
- Double learning
- Dueling architecture
- Proximal policy optimization (PPO)
- Asynchronous Advantage Actor Critic (A3C)
- Prioritized experience replay (PER)
- Deep Deterministic Policy Gradient (DDPG)
Environment
- OpenAI Gym
- Space invaders
- Lunar lander discrete/continous
- Pong
- Cartpole
- Mountain car
Tool
- OS: Ubuntu 20.04
- GPU: NVIDIA GeForce RTX 2070
- Laptop: System76 Oryx Pro
- Neural network: TensorFlow and PyTorch
- Google Colab
- AWS EC2 Ubuntu 18.04 g4dn.xlarge 1GPU
- AWS Deep Learning AMI 1GPU
Studying
- Reinforcement Learning An Introduction, Richard S. Sutton and Andrew G. Barto
- Grokking Deep Reinforcement Learning, Miguel Morales
- Coursera Reinforcement Learning Specialization by University of Alberta (https://www.coursera.org/specializations/reinforcement-learning)
- Udacity Deep Reinforcement Learning Nanodegree (https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893)
- Reading papers from OpenAI Spinning Up key papers (https://spinningup.openai.com/en/latest/spinningup/keypapers.html)