hsiehjackson/RULER

Do we have any ideia how many tokens is used to run the full benchmark in a model?

daniellefranca96 opened this issue · 1 comments

I would like to run this on Gemini 1.0 Pro and Claude 3 so we have their scores but do we have any ideia of the token usage of this benchmark so we can calculate cost in commercial models?

The token usage is depending on the length you want to test. For example, if you want to test sequence length 128K with our default setting, you will have total tokens 131072 (length) x 500 (samples per task) x 13 (tasks) = 852M tokens. You can decrease the number of samples per task to save your budget.