maybe a bug here
Closed this issue · 1 comments
reversi-alpha-zero/src/reversi_zero/agent/player.py
Lines 406 to 414 in 4ac6e06
Maybe a bug here: p_
here is NOT a probability distribution over legal moves, until you do normalization in codes after. But in the dirichlet_noise_only_for_legal_moves == True
case, dirichlet noise is already a probability distribution over legal moves. Saying, you are adding dirichlet noise on a non-probability-distribution, which I believe not consistent with AlphaGoZero paper.
I happen to find my implementation had this bug too, and after I fixed this bug, my AI's strength improves significantly.
Thank you for your pointed out.
It certainly may be better that in the dirichlet_noise_only_for_legal_moves == True
case, the noise is added after normalization.
And the dirichlet_noise_only_for_legal_moves == False
case may be not necessary.
I'll fix it.