huggingface/evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
PythonApache-2.0
Issues
- 5
Is perplexity correctly computed?
#560 opened by halixness - 3
ImportError: To be able to use evaluate-metric/glue, you need to install the following dependencies['scipy', 'scikit-learn'] using 'pip install scipy sklearn' for instance'
#642 opened by JINO-ROHIT - 2
Evaluate fails to load all metrics.
#638 opened by filbeofITK - 0
Optionally stop data deletion after compute
#641 opened by sizhky - 0
- 6
- 0
Perplexity for Left Padded Models
#636 opened by Lawhy - 2
Gradio dependency issue
#602 opened by bnaman50 - 0
Support nltk>=3.9 to fix vulnerability
#628 opened by albertvillanova - 0
- 0
Main documentation building is not triggered
#634 opened by albertvillanova - 0
- 0
Evaluate uses deprecated use_auth_token and will break with datasets-3.0
#620 opened by albertvillanova - 0
- 2
`list_evaluation_modules` returns empty list
#616 opened by MohamedAliRashad - 1
Benchmark evaluation for language models.
#615 opened by mina58 - 0
How to customize my own evaluator and metrics?
#611 opened by Kami-chanw - 1
METEOR has no option to return unaggregated results
#572 opened by ashtonomy - 2
- 7
- 0
Unable to compute f1 score - Throwing Value Error trying to convert a string in non english Language to integer
#610 opened by alans3321 - 0
[Metrics] ValueError: Expected to find locked file from process x but it doesn't exist.
#607 opened by raghavm1 - 2
AttributeError: 'CombinedEvaluations' object has no attribute 'evaluation_modules'
#603 opened by shunk031 - 3
Can't use the BLEU offline.
#565 opened by Zhuxing01 - 0
- 0
LocalModuleTest.test_load_metric_code_eval fails with "The "code_eval" metric executes untrusted model-generated code in Python."
#597 opened by jpodivin - 0
Execution of example from the Using the evaluator docs fails due to unspecified tokenizer
#594 opened by jpodivin - 3
SyntaxError: closing parenthesis '}'
#592 opened by wangxiuwen - 0
- 1
Can't load exist dataset for evaluation
#589 opened by IsmaelMousa - 12
Problems during run initial step
#590 opened by simplelifetime - 0
- 2
- 1
Unable to run pip install evaluate[template]
#576 opened by saicharan2804 - 0
- 1
the difference of your bleu and sacrebleu
#558 opened by cooper12121 - 0
[FR] Confidence intervals for metrics
#581 opened by NightMachinery - 1
- 1
- 2
Shouldn't perplexity range from [1 to inf)?
#566 opened by ivanmkc - 0
- 0
- 1
Cannot use it offline!
#567 opened by SirryChen - 1
- 2
ImportError: To be able to use evaluate-metric/rouge, you need to install the following dependencies['nltk'] using 'pip install # Here to have a nice missing dependency error message early on' for instance'
#562 opened by BAEK26 - 1
- 0
It seems like evaluate.load doesnt use
#561 opened by anhq-nguyen - 0
evaluate consuming Memory and slow down the process
#559 opened by Redix8 - 0
After fine-tuning Gemma and want to evaluate performance: AttributeError: module 'keras._tf_keras.keras' has no attribute '__internal__'
#555 opened by XinyueZ - 0
Add Precision@k and Recall@k metrics
#554 opened by Andron00e