My introduction to Reinforcement Learning: a simple pong game with RL agent. Written in Python3, you wil need the following Python modules:
- torch
- pygame
My agent and model code draws heavily on the excellent YouTube 'Reinforcement Snake' video series by Python Engineer
- Player1 has the red paddle, and is on the left side of the Pong 'court'
- Player2 has the blue paddle, and is on the right side.
- A goal is scored if the ball makes it past a paddle.
- A game is ended when a total of ten games have been scored. (e.g. final scores 5-5, 2-8, 9-1 etc.)
Pong can be run in three ways:
- Player1 as a human (you!) versus a simple CPU controlled Player2
- Player1 as the RL agent, learning to play from scratch against a CPU controlled Player2
- Player1 as the RL agent, used the learned model against a CPU controlled Player 2
To play in these modes, uses:
python3 PlayPong.py --human
python3 PlayPong.py --agent
python3 PlayPong.py --model
The game itself is highly configurable, using the constants at the top of pong_rl.py
The constants given here produce a game that a human player has a fair chance of winning.
When training the agent, you can set the threshold at which learning will stop using the LOSS_THRESHOLD
variable in agent.py.
This loss is currently the MSE loss between the neural network predicted action and the actual action taken by the agent at each
game step. The value of 0.15 will be met after about 15-30 training games (or 'episodes'). At this point, the agent has the skill
of a 'poor' human player, but it will score a few points, but maybe struggle to win a game!
If you lower the LOSS_THRESHOLD
dramatically, e.g. below 0.1, the agent's model will start to overfit the training data is has
experienced, and do odd things like sitting at the top or bottom of the screen. It will occassionally win points by chance doing
this, but it is not in general the best behaviour.
It would be good to add to the game:
- Abililty to change the state-action model (e.g. from neural network to SVM or decision tree) before training the agent
- Use regularizatio to prevent overfitting
- Add an automatic stop to training once the agent has won X games against the CPU Player2.
The screenshot below is from the agent training phase, showing the mean score after each game, as well as the current MSE loss of the model state-action neural network.