huggingface/evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

PythonApache-2.0

Issues

Is perplexity correctly computed?
#560 opened 8 months ago by halixness
5
ImportError: To be able to use evaluate-metric/glue, you need to install the following dependencies['scipy', 'scikit-learn'] using 'pip install scipy sklearn' for instance'
#642 opened 2 months ago by JINO-ROHIT
3
Evaluate fails to load all metrics.
#638 opened 2 months ago by filbeofITK
2
Optionally stop data deletion after compute
#641 opened 2 months ago by sizhky
0
Typo in `Types of Evaluations in 🤗 Evaluate` documentation
#639 opened 2 months ago by sergiopaniego
0
Add note in repo and docs that library is no longer actively maintained
#612 opened 2 months ago by MoritzLaurer
6
Perplexity for Left Padded Models
#636 opened 2 months ago by Lawhy
0
Gradio dependency issue
#602 opened 5 months ago by bnaman50
2
Support nltk>=3.9 to fix vulnerability
#628 opened 2 months ago by albertvillanova
0
Documentation building is broken due to numpy-2 and tensorflow
#630 opened 2 months ago by albertvillanova
0
Main documentation building is not triggered
#634 opened 2 months ago by albertvillanova
0
CI parity test is broken due to FileNotFoundError
#632 opened 2 months ago by albertvillanova
0
Evaluate uses deprecated use_auth_token and will break with datasets-3.0
#620 opened 2 months ago by albertvillanova
0
CI is broken due to nltk: Resource punkt_tab not found
#622 opened 2 months ago by albertvillanova
0
`list_evaluation_modules` returns empty list
#616 opened 3 months ago by MohamedAliRashad
2
Benchmark evaluation for language models.
#615 opened 3 months ago by mina58
1
How to customize my own evaluator and metrics?
#611 opened 3 months ago by Kami-chanw
0
METEOR has no option to return unaggregated results
#572 opened 7 months ago by ashtonomy
1
Evaluation not working with `microsoft/deberta-v3-large`
#553 opened 9 months ago by msinha251
2
AttributeError: module 'evaluate' has no attribute 'load'
#609 opened 4 months ago by Adesoji1
7
Unable to compute f1 score - Throwing Value Error trying to convert a string in non english Language to integer
#610 opened 4 months ago by alans3321
0
[Metrics] ValueError: Expected to find locked file from process x but it doesn't exist.
#607 opened 4 months ago by raghavm1
0
AttributeError: 'CombinedEvaluations' object has no attribute 'evaluation_modules'
#603 opened 5 months ago by shunk031
2
Can't use the BLEU offline.
#565 opened 8 months ago by Zhuxing01
3
Evaluation of form feed symbol with BLEU results in error
#601 opened 5 months ago by lowlypalace
0
LocalModuleTest.test_load_metric_code_eval fails with "The "code_eval" metric executes untrusted model-generated code in Python."
#597 opened 5 months ago by jpodivin
0
Execution of example from the Using the evaluator docs fails due to unspecified tokenizer
#594 opened 6 months ago by jpodivin
0
SyntaxError: closing parenthesis '}'
#592 opened 6 months ago by wangxiuwen
3
Evaluation of empty strings with MAUVE results in error
#593 opened 6 months ago by lowlypalace
0
Can't load exist dataset for evaluation
#589 opened 6 months ago by IsmaelMousa
1
Problems during run initial step
#590 opened 6 months ago by simplelifetime
12
[Question]Shall we adding a faster BLEU score calculator?
#586 opened 7 months ago by shenxiangzhuang
0
Metrics for multilabel problems don't match the expected format.
#585 opened 7 months ago by adamamer20
2
Unable to run pip install evaluate[template]
#576 opened 7 months ago by saicharan2804
1
How to pass generation_kwargs to the TextGeneration evaluator ?
#582 opened 7 months ago by swarnava112
0
the difference of your bleu and sacrebleu
#558 opened 8 months ago by cooper12121
1
[FR] Confidence intervals for metrics
#581 opened 7 months ago by NightMachinery
0
Module 'glue' doesn't exist on the Hugging Face Hub either.
#574 opened 7 months ago by enori
1
Perplexity metric does not apply batching correctly to tokenization
#573 opened 7 months ago by ChengSashankh
1
Shouldn't perplexity range from [1 to inf)?
#566 opened 7 months ago by ivanmkc
2
[Question] How to have no preset values sent into `.compute()`
#570 opened 7 months ago by alvations
0
Allow for specify coda device in perplexity evaluation
#568 opened 7 months ago by manuelbrack
0
Cannot use it offline!
#567 opened 8 months ago by SirryChen
1
Does Rouge score support the multilingual language?
#564 opened 8 months ago by sanjeev-bhandari
1
ImportError: To be able to use evaluate-metric/rouge, you need to install the following dependencies['nltk'] using 'pip install # Here to have a nice missing dependency error message early on' for instance'
#562 opened 8 months ago by BAEK26
2
ValueError: Predictions and/or references don't match the expected format.
#563 opened 8 months ago by antopost
1
It seems like evaluate.load doesnt use
#561 opened 8 months ago by anhq-nguyen
0
evaluate consuming Memory and slow down the process
#559 opened 8 months ago by Redix8
0
After fine-tuning Gemma and want to evaluate performance: AttributeError: module 'keras._tf_keras.keras' has no attribute '__internal__'
#555 opened 9 months ago by XinyueZ
0
Add Precision@k and Recall@k metrics
#554 opened 9 months ago by Andron00e
0