about using gumbel_distribution to transform discrete space

Question

about using gumbel_distribution to transform discrete space

Closed this issue 4 years ago · 8 comments

In the code you provided, the DDPG algorithm supports continuous and discrete action spaces by using the Gumbel_distribution. Maddpg is a DDPG-based extension, and whether it is suitable for discrete action spaces by using Gumbel_distribution. when i employ Gumbel in MADDPG, i can not obtain appropriate results. the version of tensorflow i used is 1.14, i don't use the tensorflow_probability module, could you give me some code exmples of Gumbel in TF, or give me some instructions? sorry to bother you.

Answer 1 · 2020-07-09T03:22:05.000Z

I'm not sure about whether Gumble is suitable for MADDPG.
Actually, maddpg in my repo does not work well for now, and I'll fix it next.
You can find something related to Gumble distribution at here
I'm not familiar with how to use Gumble with TF1.x, maybe you could find related answers in others' repo.

Answer 2 · 2020-07-09T03:46:02.000Z

thanks for your replying. in the MADDPG code supplied by OpenAI (https://github.com/openai/maddpg/), they use Gumbel sample in distributons.py as follows:
def sample(self):
u = tf.random_uniform(tf.shape(self.logits))
return U.argmax(self.logits - tf.log(-tf.log(u)), axis=1)
may be the MADDPG is able to address the discrete action spaces...

Answer 3 · 2020-07-09T05:25:36.000Z

yes, Gumbel softmax is a good trick in solving gradient conduction discrete problem gradients.

Since you have found the solution, this issue will now be closed, feel free to re-open it.

Answer 4 · 2020-07-09T06:52:48.000Z

when you employ Gumbel softmax in DDPG solving discrete action spaces problem, did you get the desired outcome?

Answer 5 · 2020-07-09T07:06:04.000Z

@tanxiangtj yes, it works well with gym

Answer 6 · 2020-07-09T07:06:56.000Z

您好！您在上海？您这个工作很不错，方便给个微信联系方式吗，向您请教请教

Answer 7 · 2020-07-09T07:12:55.000Z

您好！您在上海？您这个工作很不错，方便给个微信联系方式吗，向您请教请教

如果您有什么问题，可以通过email或者在issue中联系我。

Answer 8 · 2020-07-09T07:16:42.000Z

好的。谢谢！如果您解决了MADDPG在离散动作空间的应用，请指导我一下。