maybe a bug here

reversi-alpha-zero/src/reversi_zero/agent/player.py

Lines 406 to 414 in 4ac6e06

    
           if is_root_node and self.play_config.noise_eps > 0:  # Is it correct?? -> (1-e)p + e*Dir(alpha) 
        
               if self.play_config.dirichlet_noise_only_for_legal_moves: 
        
                   noise = dirichlet_noise_of_mask(legal_moves, self.play_config.dirichlet_alpha) 
        
               else: 
        
                   noise = np.random.dirichlet([self.play_config.dirichlet_alpha] * 64) 
        
               p_ = (1 - self.play_config.noise_eps) * p_ + self.play_config.noise_eps * noise 
        
           # re-normalize in legal moves 
        
           p_ = p_ * bit_to_array(legal_moves, 64)

reversi-alpha-zero/src/reversi_zero/agent/player.py

Line 420 in 4ac6e06

p_ = self.normalize(p_, temperature)

Maybe a bug here: p_ here is NOT a probability distribution over legal moves, until you do normalization in codes after. But in the dirichlet_noise_only_for_legal_moves == True case, dirichlet noise is already a probability distribution over legal moves. Saying, you are adding dirichlet noise on a non-probability-distribution, which I believe not consistent with AlphaGoZero paper.

I happen to find my implementation had this bug too, and after I fixed this bug, my AI's strength improves significantly.

@gooooloo

Thank you for your pointed out.
It certainly may be better that in the dirichlet_noise_only_for_legal_moves == True case, the noise is added after normalization.

And the dirichlet_noise_only_for_legal_moves == False case may be not necessary.

I'll fix it.

	if is_root_node and self.play_config.noise_eps > 0: # Is it correct?? -> (1-e)p + e*Dir(alpha)
	if self.play_config.dirichlet_noise_only_for_legal_moves:
	noise = dirichlet_noise_of_mask(legal_moves, self.play_config.dirichlet_alpha)
	else:
	noise = np.random.dirichlet([self.play_config.dirichlet_alpha] * 64)
	p_ = (1 - self.play_config.noise_eps) * p_ + self.play_config.noise_eps * noise

	# re-normalize in legal moves
	p_ = p_ * bit_to_array(legal_moves, 64)