why retreive the first element of action

Line 32 in ba2e8e1

ac = ac[0]

May i know why, this statement only return the first element, instead of using argmax to choose the best action, output from tf.multinomial?

I think you can print out the shape of ac before this statement is reached. Also the environment is continuous for hw1.