declare-lab/instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
PythonApache-2.0
Issues
- 0
Multi GPU Support is required
#33 opened by chintan-ushur - 0
Evaluate EncoderDecoderModels
#32 opened by Bachstelze - 0
Colab notebook
#31 opened by Bachstelze - 0
CRASS
#30 opened by Bachstelze - 1
Evaluate on a single 24GB/32GB GPU
#29 opened by lemyx - 1
How to submit own model to leaderboard?
#28 opened by timothylimyl - 0
- 0
What are the metrics for the evaluation results?
#26 opened by zhimin-z - 7
Can not reproduce results on the table
#3 opened by simplelifetime - 0
- 1
Support for larger batch_size
#18 opened by soumyasanyal - 1
HHH Benchmark evaluation question: why using base prompt and (A - A_base) > (B - B_base)?
#20 opened by t170815518 - 1
What to do about broken Evals?
#21 opened by damhack - 1
Fail to Evaluate Model on human_eval
#22 opened by yjw1029 - 0
C-Eval
#17 opened by duanqiyuan - 1
请问能加入对baichuan大模型的支持吗
#16 opened by linghongli - 0
add multiple gpu support
#15 opened by lxy444 - 0
[Feature Request] Saving Prediction Results
#14 opened by guanqun-yang - 0
Is there any parallel processing methods?
#13 opened by wwngh1233 - 0
Add config to save eval results
#12 opened by arthurtobler - 1
Future directions
#11 opened by tju01 - 1
执行过程中报错,信息如下
#9 opened by linghongli - 0
Regarding the comparison to lm-evaluation-harness
#10 opened by gakada - 1
Integrate the evaluation in the Transformers trainer with transformers.TrainerCallback
#7 opened by BaohaoLiao - 1
- 2
Add License
#5 opened by passaglia - 1
Add zero-shot evaluation results
#4 opened by LeeShiyang - 2
Prompt format for LLaMa
#2 opened by LeeShiyang