Issues
- 1
Add new Mistral Large Instruct 2411 (which supposed to be an improvement compared to 2407 currently in the table)
#74 opened by Lissanro - 0
Rough runtime benchmarks across tasks and context-lengths on any hardware setups
#76 opened by girishbalaji - 1
requirements.txt
#75 opened by wangyu-ustc - 19
Evaluate on Jamba-1.5-Mini
#69 opened by coranholmes - 1
DOCKER_BUILDKIT=1 docker build -f Dockerfile -t cphsieh/ruler:0.2.0 . excute wrong
#73 opened by yuanhang110 - 2
Detailed scores of Phi-3-mini-128k
#71 opened by huangyuxiang03 - 1
Qwen2 and DeepSeek-V2 results?
#33 opened by hijkzzz - 1
- 4
Request for permissions
#61 opened by ChenAlmagor - 1
Issue with installation: huggingface-hub
#64 opened by SimJeg - 3
OOM issue during evaluation
#66 opened by mengniwang95 - 1
Unable to reproduce result for Llama3.1(8B)
#70 opened by muhangao - 2
lost in the middle problem
#24 opened by vkaul11 - 3
About Mistral-Small-Instruct-2409
#65 opened by showgood163 - 3
gpt-4o results?
#12 opened by the21st - 3
GPT-4-1106-preview
#63 opened by yxgcsq - 2
- 1
datasets where
#58 opened by yxgcsq - 1
Gemini flash 1.5 results
#43 opened by augusto-rehfeldt - 2
- 8
- 4
- 2
- 2
- 2
About InterLM2.5
#47 opened by showgood163 - 2
hope add qwen2-7b-chat result
#46 opened by Chandler-Bing - 1
RULER with Mamba
#41 opened by Andron00e - 1
- 0
The one-shot example of CWE task
#38 opened by guanzhchen - 1
- 1
questions about ICL code for variable tracking
#27 opened by vkaul11 - 1
- 1
- 1
What is the need for is_icl parameter?
#25 opened by vkaul11 - 4
prediction evaluation statistics
#22 opened by vkaul11 - 2
- 2
No Generated Output and JSON Serialization Error when calling llm directly in VLLMClient
#11 opened by yaswanth-iitkgp - 1
128K sequence length means 131072 or 128000
#34 opened by syp1997 - 1
Error in hugging face links in README
#35 opened by etienneasln - 3
- 1
pre_sample in qa code
#29 opened by vkaul11 - 3
Base vs Chat prompt question.
#31 opened by karansaxena - 2
request for evaluating GLM4-9B-chat(-1M)
#28 opened by yucc-leon - 5
Prediction format during evals
#30 opened by karansaxena - 10
- 3
what was the reason to use nltk in NIAK task here
#19 opened by vkaul11 - 2
dataset argument for qa.py not specified
#18 opened by vkaul11 - 1
Why do you use partial match max metric for QA
#15 opened by vkaul11 - 0
Question about files nouns.list and verbs.list
#16 opened by vkaul11 - 2
Tempate for Yi?
#13 opened by liyucheng09