xuanlinli17/CS285_Fa19_Deep_Reinforcement_Learning

why retreive the first element of action

Closed this issue · 1 comments

May i know why, this statement only return the first element, instead of using argmax to choose the best action, output from tf.multinomial?

I think you can print out the shape of ac before this statement is reached. Also the environment is continuous for hw1.