Deep Reinforcement Learning implementation of Policy Gradient on a simple Grid-World problem using PyTorch.
In main.py
define the environment and the agent, as well as the hyperparameters of the policy gradient network and run python3 main.py
. The script saves a plot of the average rewards during training and validation. (See figure below for an example)
pytorch
numpy
opencv
matplotlib
.