A3C algorithm - background

Question

A3C algorithm - background

ShaniGam opened this issue 7 years ago · 9 comments

Amazing work!!
Tried running the A3C algorithm for breakout and it works great!
Where did you get the background information in order to write the code? It's a little bit different than what was explained in the "Asynchronous Methods for Deep Reinforcement Learning" paper.
Thanks :)

Answer 1 · 2017-06-17T13:55:03.000Z

thank you!!
yes, it's little different from "Asyncronous Methods for Deep Reinforcement Learning" paper.
this code don't have local network and don't get gradient directly. but fortunately it works!!
we will update this code to use local network.

we almost refered to this blog and change this code to our style.
and we refered to many other codes and paper for finding right hyperparameters.
this code is helpful.

thanks again @ShaniGam

Answer 2 · 2017-06-17T14:07:39.000Z

Thanks for the reply. Another question, did you use the epsilon-greedy policy? It seems like the algorithm choose what the network predicts all the time. @zzing0907

Answer 3 · 2017-06-17T14:40:33.000Z

No, we don't use epsilon-greedy but we choose action stochastically.
actor's output is policy and the policy is probability(because last layer is softmax layer).
agent choose action from probability of each actions, so agent can explore.

thanks. @ShaniGam

Answer 4 · 2017-06-17T15:11:08.000Z

@ShaniGam
plus to @zzing0907 said, you need to see loss function of optimizer of actor network.
Entropy part of the loss function encourage agent to keep explore.

Answer 5 · 2017-06-19T17:54:31.000Z

Hi all,
I also have to give you a big compliment to this great work.
To understand the A3C better could you perhaps provide your grid world example solved by A3C?
With such an easy environment it would be easier to get into it, I guess.
Regards

Answer 6 · 2017-06-22T09:42:03.000Z

@Maschwe
Thank you for your compliment!
Now we are providing A3C code for Cartpole and Breakout.
We think Cartpole is easy enough to understand A3C.
Take a look at cartpole_a3c.py and if you are still confused about A3C, then we will consider
making A3C agent for grid world.

Answer 7 · 2017-06-22T16:11:24.000Z

Wow awesome! Thanks :)
My thought about implementing A3C for the grid was that you don't have to get familiar with gym. Your grid implementation is so intuitive that someone new almost knows instantaneously whats going on there and can completely concentrate on the algorithm itself ;) In addition you can also learn how much the environment influences the algorithm, since you can change it. Changing for example the actual input as it is the distances to each single object, to a complete map giving full information about all objects. This would probably require one or two convolutional layer or bigger fully connected layer. All those things are easier to explore and understand when you have the full control about the environment.
But anyway, thank you very much for the new two examples, I will definitely have a look into your new implementations 👍

Answer 8 · 2017-06-24T08:35:06.000Z

I am interesting about what you are saying about adapting CNN to grid world. Thank you for your compliment!

Answer 9 · 2017-06-28T03:57:08.000Z

Hi, I’m not sure whether this works or not but I try to reply here. I meant that you probably need to implement a CNN when you give the whole Map information back as a state. For example a 10 times 10 matrix with ones in the position of free space, zeros where walls are, 0.75 where the agent is and 0.25 where the target is. Then the agent could learn to avoid obstacles in general since he could learn that obstacles are represented by a zero and every time he steps on it he gets a huge negative reward. Regards, Martin Von: Leewoongwon [mailto:notifications@github.com] Gesendet: Samstag, 24. Juni 2017 10:35 An: rlcode/reinforcement-learning <reinforcement-learning@noreply.github.com> Cc: Schweigler, Martin <schweigler@embedded.rwth-aachen.de>; Mention <mention@noreply.github.com> Betreff: Re: [rlcode/reinforcement-learning] A3C algorithm - background (#39) I am interesting about what you are saying about adapting CNN to grid world. Thank you for your compliment! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#39 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AcINpPwRmc7KbtjxAU_Zp2ZGJD9hDJjUks5sHMo8gaJpZM4N8jy7> .