zhimin-z opened this issue a year ago · 1 comments
For the evaluation benchmark, I did not see a difference, but the number of tested models. Is that the only difference?
Solved it by checking this sentence: