thu-coai/SafetyBench

What are differences between Chinese and Chinese Subset leaderboards

zhimin-z opened this issue · 1 comments

image
For the evaluation benchmark, I did not see a difference, but the number of tested models.
Is that the only difference?

Solved it by checking this sentence:
image