why retreive the first element of action
Closed this issue · 1 comments
cometta commented
May i know why, this statement only return the first element, instead of using argmax to choose the best action, output from tf.multinomial?
xuanlinli17 commented
I think you can print out the shape of ac
before this statement is reached. Also the environment is continuous for hw1.