The main file can be used to generate the results.
Dependencies - Would need PyTorch, Gym, and PyBullet using Bullet gym environments
Usage- From the directory A2C, run python main.py with the following arguments --env --algo (REINFORCE Or A2C) --mem_steps --total_steps
Note- Using same random seeds does not help since we ae dealing with stochastic algorithms. Sometimes, A2C does not give good performance, kindly run 3-4 times for evaluating the learning curves, it takes less than 1 minute for one run on GPU (atleast my system)
Test cases for reproducibility - *Please run from terminal the following commands A2C
-
python main.py --env Pendulum-v1 --algo A2C --mem_steps 32 --learning_steps 5000
-
python main.py --env CartPoleContinuousBulletEnv-v0 --algo A2C --mem_steps 32 --learning_steps 5000
-
python main.py --env InvertedDoublePendulumBulletEnv-v0 --algo A2C --mem_steps 32 --learning_steps 2500
-
python main.py --env MountainCarContinuous-v0 --algo A2C --mem_steps 32 --learning_steps 20000
-
python main.py --env Walker2DBulletEnv-v0 --algo A2C --mem_steps 32 --learning_steps 12500
REINFORCE
-
python main.py --env Pendulum-v1 --algo REINFORCE --mem_steps 64 --learning_steps 10000
-
python main.py --env CartPoleContinuousBulletEnv-v0 --algo REINFORCE --mem_steps 64 --learning_steps 5000
-
python main.py --env InvertedDoublePendulumBulletEnv-v0 --algo REINFORCE --mem_steps 64 --learning_steps 2500
-
python main.py --env MountainCarContinuous-v0 --algo REINFORCE --mem_steps 64 --learning_steps 15000
-
python main.py --env Walker2DBulletEnv-v0 --algo REINFORCE --mem_steps 64 --learning_steps 7500
Evaluation of Learnt Agent For Pendulum-v1, please follow the following steps, run these commands from different terminals
- First run the following command and wait until learning is finished, it will show a plot.
python main.py --env Pendulum-v1 --algo A2C --mem_steps 32 --learning_steps 5000
- Open a new terminal and run the following to give rewards for 250 episodes of the learned policy
python main.py --env Pendulum-v1 --algo A2C --mem_steps 32 --learning_steps 5000 --learn 0 --random 0 #learned policy evaluation
- Open a new terminal and run the following to give rewards for 250 episodes of the random policy
python main.py --env Pendulum-v1 --algo A2C --mem_steps 32 --learning_steps 5000 --learn 0 --random 1 #random policy evaluation