rlcode/reinforcement-learning

A3C algorithm - background

ShaniGam opened this issue · 9 comments

Amazing work!!
Tried running the A3C algorithm for breakout and it works great!
Where did you get the background information in order to write the code? It's a little bit different than what was explained in the "Asynchronous Methods for Deep Reinforcement Learning" paper.
Thanks :)

thank you!!
yes, it's little different from "Asyncronous Methods for Deep Reinforcement Learning" paper.
this code don't have local network and don't get gradient directly. but fortunately it works!!
we will update this code to use local network.

we almost refered to this blog and change this code to our style.
and we refered to many other codes and paper for finding right hyperparameters.
this code is helpful.

thanks again @ShaniGam

Thanks for the reply. Another question, did you use the epsilon-greedy policy? It seems like the algorithm choose what the network predicts all the time. @zzing0907

No, we don't use epsilon-greedy but we choose action stochastically.
actor's output is policy and the policy is probability(because last layer is softmax layer).
agent choose action from probability of each actions, so agent can explore.

thanks. @ShaniGam

@ShaniGam
plus to @zzing0907 said, you need to see loss function of optimizer of actor network.
Entropy part of the loss function encourage agent to keep explore.

Hi all,
I also have to give you a big compliment to this great work.
To understand the A3C better could you perhaps provide your grid world example solved by A3C?
With such an easy environment it would be easier to get into it, I guess.
Regards

@Maschwe
Thank you for your compliment!
Now we are providing A3C code for Cartpole and Breakout.
We think Cartpole is easy enough to understand A3C.
Take a look at cartpole_a3c.py and if you are still confused about A3C, then we will consider
making A3C agent for grid world.

Wow awesome! Thanks :)
My thought about implementing A3C for the grid was that you don't have to get familiar with gym. Your grid implementation is so intuitive that someone new almost knows instantaneously whats going on there and can completely concentrate on the algorithm itself ;) In addition you can also learn how much the environment influences the algorithm, since you can change it. Changing for example the actual input as it is the distances to each single object, to a complete map giving full information about all objects. This would probably require one or two convolutional layer or bigger fully connected layer. All those things are easier to explore and understand when you have the full control about the environment.
But anyway, thank you very much for the new two examples, I will definitely have a look into your new implementations 👍

I am interesting about what you are saying about adapting CNN to grid world. Thank you for your compliment!