bentrevett/pytorch-rl

Adding a sample_action method for ActorCritic

lemikhovalex opened this issue · 0 comments

Hello! I've been learning how to code RL form your repo. I've replace duplicating code lines from
def train
def update_policy

to agent's method self.sample_action(). And it seems that agent now solves Cart-Pole problem x2 slower(num of episodes). And it happes everytime. I have no idea what happens with torch and havn't found anything on Internet.
Can you pls help me?

https://github.com/lemikhovalex/pytorch-rl
5_tr - Proximal Policy Optimization (PPO) [CartPole]-Copy1.ipynb