Ber666/RAP

Depth of MCTS

andreasbinder opened this issue · 3 comments

Hi guys, awesome work!
I am new to the field of reasoning with LLMs and have a more theoretical question: I thought MCTS main advantage comes into play when you have scenarios with huge depth, that computation is infeasible, like in chess. Thus, I am confused what max_depth=4 is supposed to mean in that context; is max_depth=T or does it mean the state at which you start the simulation?
Thank you so much for clarifying!

Ber666 commented

Thanks for your question. max_depth is the maximum depth of the search tree, i.e., the node at this depth is regarded as a terminal node.

I thought MCTS main advantage comes into play when you have scenarios with huge depth

That's true for traditional applications of MCTS like board games, but in the case of LLM reasoning, it's already infeasible to do an exhaustive search when the depth is 4 given its computational cost, and we have to explore the search space wisely, which motivates the use of MCTS.

Thank you a lot for the clarification! So if I understand you correctly, MCTS is usable because of the cost each LLM call incurs as well as the broad action space (in case the LLM does not follow instructions).

One last question: for my master thesis I decided to use MCTS similar to your approach with GSM8k, however I am unsure what kind for questions (regarding usefulness etc) are created as you go deeper in the tree. Do you have example outputs or did you experiment how useful subquestions at eg depth=5 are for the main question? My main concern is that the created questions might be more generic or repetitive! Thx once again!

Ber666 commented

My main concern is that the created questions might be more generic or repetitive!

That's a very reasonable concern, and it motivated us to introduce the self-evaluation reward, which is supposed to penalize those generic or repetitive questions.

You can check some examples on our demo page: link.