opendilab/LightZero

Confusion between "battle_mode" and "mcts_mode"

marintoro opened this issue · 2 comments

Hello,

I think there is a "bug" in the actual version of the alphago code when using mode "play_with_bot_mode".
Indeed in both tictactoe_env.py and gomoku_env.py there is this line hardcoded:

self.mcts_mode = 'self_play_mode'

So mcts_mode is always set to self_play_mode, no matter what is giving inside the config.
Moreover in both python tree and C++ tree of alphago we can found those lines:

self.simulate_env.battle_mode = self.simulate_env.mcts_mode # In ptree_az.py
simulate_env.attr("battle_mode") = simulate_env.attr("mcts_mode"); # In mcts_alphazero.cpp

So that means that no matter what we give in config for battle_mode, this is overrided with the mcts_mode which is always "self_play_mode"...

In conclusion, after reviewed quickly the code, I think that mcts_mode should just be removed and replaced by battle_mode everywhere because both attributes seems to make the exact same things (but I may be wrong).

To reproduce you can just run the standard tictactoe in 'play_with_bot_mode' (by running tictactoe_alphazero_bot_mode_config.py) and check that the mcts is always using "self_play_mode".

Thank you very much for your thoughtful feedback.

  • We acknowledge your point, but it's indeed necessary to consistently set simulate_env.battle_mode to self_play_mode. This is because regardless of how we interact with the true environment during the data collection phase (i.e., whatever the battle_mode setting in the real environment), we should not give the agent access to the opponent's policy when executing the MCTS search. Therefore, during the MCTS search process, simulate_env.battle_mode is always set to self_play_mode.

  • However, this could potentially lead to some confusion. Our self.mcts_mode might need to be renamed to self.battle_mode_in_simulation_env to more accurately reflect its role in the simulation environment. It is worth noting that we have not hardcoded a fixed value, but instead left this parameter reserved for debugging purposes.

  • For relevant information, you may refer to this issue.

If you have any suggestions for improvement, please feel free to provide them. Best wishes!

Hello,
Ok sorry for my missunderstanding.
Thank you for your explanation I understand now why you always set battle_mode to self_play in the MCTS.