Depth of MCTS
andreasbinder opened this issue · 3 comments
Hi guys, awesome work!
I am new to the field of reasoning with LLMs and have a more theoretical question: I thought MCTS main advantage comes into play when you have scenarios with huge depth, that computation is infeasible, like in chess. Thus, I am confused what max_depth=4
is supposed to mean in that context; is max_depth=T
or does it mean the state at which you start the simulation?
Thank you so much for clarifying!
Thanks for your question. max_depth
is the maximum depth of the search tree, i.e., the node at this depth is regarded as a terminal node.
I thought MCTS main advantage comes into play when you have scenarios with huge depth
That's true for traditional applications of MCTS like board games, but in the case of LLM reasoning, it's already infeasible to do an exhaustive search when the depth is 4 given its computational cost, and we have to explore the search space wisely, which motivates the use of MCTS.
Thank you a lot for the clarification! So if I understand you correctly, MCTS is usable because of the cost each LLM call incurs as well as the broad action space (in case the LLM does not follow instructions).
One last question: for my master thesis I decided to use MCTS similar to your approach with GSM8k, however I am unsure what kind for questions (regarding usefulness etc) are created as you go deeper in the tree. Do you have example outputs or did you experiment how useful subquestions at eg depth=5 are for the main question? My main concern is that the created questions might be more generic or repetitive! Thx once again!
My main concern is that the created questions might be more generic or repetitive!
That's a very reasonable concern, and it motivated us to introduce the self-evaluation reward, which is supposed to penalize those generic or repetitive questions.
You can check some examples on our demo page: link.