YenFuLin opened this issue 7 months ago · 1 comments
Hi, I'm wondering why this benchmark don't have native LLM's result(such as llama2, llama3). Do you plan to add these results on this work?
Hi, thank you for your question.
We have not tested these open-source models yet but it is on the roadmap.