/Reinforcement_Learning

강화학습에 대한 기본적인 알고리즘 구현

Primary LanguagePython

Reinforcement Learning

여러 환경에 적용해보는 강화학습 예제(파이토치로 옮기고 있습니다)

Alt text

[Breakout / Use DQN(Nature2015)]

1. Q-Learning / SARSA

2. Q-Network (Action-Value Function Approximation)

3. DQN

DQN(NIPS2013)은 (Experience Replay Memory / CNN) 을 사용.

DQN(Nature2015)은 (Experience Replay Memory / Target Network / CNN) 을 사용

5. Vanilla Policy Gradient(REINFORCE)

6. Advantage Actor Critic

7. Deep Deterministic Policy Gradient

8. Parallel Advantage Actor Critic(is called 'A2C' in OpenAI)

9. C51(Distributional RL)

10. PPO(Proximal Policy Optimization)