This repository contains an implementation of the n-step Advantage Actor-Critic (A2C) algorithm for the CartPole environment. The project explores both discrete and continuous action spaces, and investigates the effects of various hyperparameters on the learning process.
K=1-n=1-disc/
: Contains an animation of the CartPole when K=1, n=1 in the discrete environment.imgs/
: Contains plots used in the report.lists/
: Contains data for each agent to reproduce plots without retraining.CS_456_MP2_A2C.pdf
: The project report detailing methodology and results.MP2_A2C.pdf
: The project handout with specifications.train.py
: Implementation of the A2C algorithm and supporting functions.Solution.ipynb
: Jupyter notebook to run the A2C algorithm and generate plots.
- Implementation of n-step A2C for both discrete and continuous action spaces
- Support for multiple workers (K) and n-step returns
- Evaluation and logging functionalities
- Visualization of training progress, value functions, and agent performance
- Clone this repository
- Install the required dependencies, see
requirements.txt
. - Run cells in
Solution.ipynb
to train the agent or reproduce plots using pre-saved data
The project explores various configurations of the A2C algorithm, including:
- Basic A2C version in CartPole
- Stochastic rewards
- Multiple workers (K-workers)
- n-step returns
- K × n batch learning
Detailed results and analysis can be found in CS_456_MP2_A2C.pdf
.
To train a new agent or reproduce results:
- Open
Solution.ipynb
- Adjust hyperparameters as needed
- Run the cells to train the agent or generate plots from pre-saved data
This project was completed as part of the EPFL Artificial Neural Networks and Reinforcement Learning course, in collaboration with @eliashornberg.