REINFORCE == REward Increment = Nonnegative Factors times Offset Reinforcement times Characteristic Eligibility
#Nice #TattooMaterial - source
Screen captures of solved simulations:
|
|
---|---|
|
|
|
|
|
|
|
|
A collection of reinforcement learning projects I have done in OpenAI Gym and Unity ML-agents. Learned and implemented basic to complex reinforcement algorithms, from using the Monte Carlo approach for solving puzzles to using the Multi-Agent Deep Deterministic Policy Gradient method for training table tennis players. Detailed description of each project could be found by clicking on project titles in the table above.
This is one of the most interesting topic I have had a chance to peek into. However, it definitely contains more mathematical concepts than even most of the other deep learning algorithms (my perspective), but the fact that it is also one of the hardest challenge for some of the smartest minds on Earth is soothing the pain of me needing to open 10 google tabs just to comprehend a page of some paper.
Monte Carlo Methods - Epsilon-Greedy policies, GLIE, state and action value functions, Bellman Equations
Temporal-Different Methods - Sarsa, Q-Learning, and Expected Sarsa
Continuous Spaces - Discretization, Tile Coding, and Function Appoximations
Value-Based Methods - Implementation of Deep Q-Networks, Double Q-Networks
Policy-Based Methods - Stochastic Policy Search, Hill Climbing Algorithm, REINFORCE, Proximal Policy Optimization, A3C, A2C, N-step bootstrapping, GAE, DDPG, Continuous Control
Multi-Agent Reinforcement Learning (MARL) - Cooperative and Competitive Behaviors, Multi-Agent DDPG, Monte Carlo Tree Search
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto. Awesome textbook that is not afraid to go indepth into the mathematics of RL.
- Human-Level Control through Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-Learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Prioritized Experience Replay
- Proximal Policy Optimization Algorithms
- Continuous control with deep reinforcement learning
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
The Udacity Nanodegree program provided great assistance. It helped a lot with configuring OpenAI Gym and Unity Environments, provided me with pretty good GPU, and even some skeleton of some early projects to pull me through the initial learning curve. However, since the course does not seem to be very popular due to the low demand, it is quite unstructured for someone to resort knowledge on.