THUNLP-MT/StableToolBench

How is native LLM on this benchmark?

YenFuLin opened this issue · 1 comments

Hi,
I'm wondering why this benchmark don't have native LLM's result(such as llama2, llama3).
Do you plan to add these results on this work?

Hi, thank you for your question.

We have not tested these open-source models yet but it is on the roadmap.