Lab 4 Review by Giorgio Cacopardi (s309685)

Question

Lab 4 Review by Giorgio Cacopardi (s309685)

Closed this issue 9 months ago · 0 comments

Hi Silvano,
Well done for the lab 4, you have done a good job.
I found really smart the rewards you give to the agent in case it makes move leading to winning states, this really help the learning path.
Here i have some suggestion that i think can improve your results:

Regarding the move choice strategy, in addition to epsilon greedy strategy, I recommend you try additional strategies such as upper bound confidence and softmax ( or boltzman) exploration, so you can see if there is one that performs better.
I suggest to increment the number of training episodes and reach at least 200000 as total number, this helps me to reach better performance
I also suggest to train the q agent also as second player to learn more states and playing better as second
As final suggestion, i recommend to update your formula for the q table values and instead of add the max values of the next state, put a minus sign, since the next state is for the opponent, and so the goal become to minimize the other opponent state (this observation was made by Davide Vitabile (s330509))

I hope that my suggestions helps you and good luck!