Pytorch impelentation for MiniGrid and DeepSea experiments from the paper "Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces"
clone repository, create new virtualenv and install dependencies:
git clone https://github.com/GuyLor/reinforcement_learning.git
python3 -m venv direct_rl
source direct_rl/bin/activate
cd reinforcment_learning
pip3 install -r requirements.txt
train from scratch:
python run.py --train
let the trained policy to "play" after training:
python run.py --train --play
save and/or load the model after training:
python run.py --train --play --save_path my_policy_model_new.pkl --load_path my_policy_model.pkl
open tensorboard:
tensorboard --logdir logs