NVIDIA/RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

PythonApache-2.0

Issues

Add new Mistral Large Instruct 2411 (which supposed to be an improvement compared to 2407 currently in the table)
#74 opened a month ago by Lissanro
1
Rough runtime benchmarks across tasks and context-lengths on any hardware setups
#76 opened a month ago by girishbalaji
0
requirements.txt
#75 opened a month ago by wangyu-ustc
1
Evaluate on Jamba-1.5-Mini
#69 opened 3 months ago by coranholmes
19
DOCKER_BUILDKIT=1 docker build -f Dockerfile -t cphsieh/ruler:0.2.0 . excute wrong
#73 opened a month ago by yuanhang110
1
Detailed scores of Phi-3-mini-128k
#71 opened 3 months ago by huangyuxiang03
2
Qwen2 and DeepSeek-V2 results?
#33 opened 3 months ago by hijkzzz
1
Test results for the June sneaky update of the Phi 3 models ?
#49 opened 3 months ago by bhugueney
1
Request for permissions
#61 opened 3 months ago by ChenAlmagor
4
Issue with installation: huggingface-hub
#64 opened 3 months ago by SimJeg
1
OOM issue during evaluation
#66 opened 3 months ago by mengniwang95
3
Unable to reproduce result for Llama3.1(8B)
#70 opened 3 months ago by muhangao
1
lost in the middle problem
#24 opened 6 months ago by vkaul11
2
About Mistral-Small-Instruct-2409
#65 opened 4 months ago by showgood163
3
gpt-4o results?
#12 opened 8 months ago by the21st
3
GPT-4-1106-preview
#63 opened 4 months ago by yxgcsq
3
New Command R 08-2024 and Command R+ 08-2024 models
#60 opened 4 months ago by jukofyork
2
datasets where
#58 opened 4 months ago by yxgcsq
1
Gemini flash 1.5 results
#43 opened 5 months ago by augusto-rehfeldt
1
Performance Differences in Qwen2-72B-Instruct-131k
#56 opened 4 months ago by lwang2070
2
Performance discrepancy of Llama3.1-8b-instruct
#54 opened 4 months ago by zhenyuhe00
8
Can't reproduce results of meta-llama/Meta-Llama-3.1-8B-Instruct
#53 opened 5 months ago by PiotrNawrot
4
Any chance of testing ` Mistral-Large-Instruct-2407`?
#52 opened 5 months ago by jukofyork
2
A mistral long context - MegaBeam-Mistral-512K
#48 opened 5 months ago by chenwuperth
2
About InterLM2.5
#47 opened 5 months ago by showgood163
2
hope add qwen2-7b-chat result
#46 opened 5 months ago by Chandler-Bing
2
RULER with Mamba
#41 opened 5 months ago by Andron00e
1
Is there a particular reason to not support batch processing?
#39 opened 5 months ago by ViktorooReps
1
The one-shot example of CWE task
#38 opened 6 months ago by guanzhchen
0
Is there any issue in extending context length to 1 million using your script
#26 opened 6 months ago by vkaul11
1
questions about ICL code for variable tracking
#27 opened 6 months ago by vkaul11
1
Why do you need to separate the last batch of the output
#21 opened 6 months ago by vkaul11
1
how do you take care of the presence of 'and' in the output in the evaluation
#23 opened 6 months ago by vkaul11
1
What is the need for is_icl parameter?
#25 opened 6 months ago by vkaul11
1
prediction evaluation statistics
#22 opened 6 months ago by vkaul11
4
Add answer_predfix to prevent model from refusing to answer typo?
#20 opened 6 months ago by vkaul11
2
No Generated Output and JSON Serialization Error when calling llm directly in VLLMClient
#11 opened 6 months ago by yaswanth-iitkgp
2
128K sequence length means 131072 or 128000
#34 opened 6 months ago by syp1997
1
Error in hugging face links in README
#35 opened 6 months ago by etienneasln
1
Reproducing results 4k (LLaMA-2 7B chat, Mistral 7B Instruct v0.2)
#36 opened 6 months ago by ThomasSURF
3
pre_sample in qa code
#29 opened 7 months ago by vkaul11
1
Base vs Chat prompt question.
#31 opened 7 months ago by karansaxena
3
request for evaluating GLM4-9B-chat(-1M)
#28 opened 7 months ago by yucc-leon
2
Prediction format during evals
#30 opened 7 months ago by karansaxena
5
How to test models with larger context length than 128K ?
#14 opened 7 months ago by yaswanth-iitkgp
10
what was the reason to use nltk in NIAK task here
#19 opened 7 months ago by vkaul11
3
dataset argument for qa.py not specified
#18 opened 7 months ago by vkaul11
2
Why do you use partial match max metric for QA
#15 opened 7 months ago by vkaul11
1
Question about files nouns.list and verbs.list
#16 opened 7 months ago by vkaul11
0
Tempate for Yi?
#13 opened 7 months ago by liyucheng09
2