This example demonstrates a reinforcement learning agent playing a variation of the game of Pong® using Reinforcement Learning Toolbox™. You will follow a command line workflow to create a DDPG agent in MATLAB®, set up hyperparameters and then train and simulate the agent.
This example requires installation of the following software:
- MATLAB R2020b or later
- Deep Learning Toolbox™
- Reinforcement Learning Toolbox
You can download the latest version of MATLAB from this link. For installation instructions, follow the link here.
After downloading and installing MATLAB, clone this repository to get the required scripts. The following two scripts can be used to train or simulate the agent.
- train_agent.m - script for creating and training a reinforcement learning agent
- play_agent.m - script for playing the game
The following scripts are used to create the environment:
- Environment.m - class for modeling the game
- Visualizer.m - class for animation functions
The Environment for the game is a two dimensional space with a ball and a paddle. The ball starts with an initial velocity and moves around in the environment. The walls restrict the ball from moving outside the environment and also transfers some momentum to the ball on collision. For this reason there is a slight velocity change whenever the ball collides an object. The paddle is located at the bottom half and can move left to right to prevent the ball falling below.
A Deep Deterministic Policy Gradient (DDPG) reinforcement learning agent is used in this example. The agent learns to hit the ball by observing the following states in the environment:
- x, y positions of the ball
- x, y velocities of the ball
- x position of the paddle
- x velocity of the paddle
- Action values from the last time step
The action of the agent is the force applied on the paddle in the x direction.
To create an agent and run the training, open and run the train_agent.m script.
To view a pre-trained agent playing the game, use the script play_agent.m.
For additional resources on reinforcement learning, take a look at the following: