Execution of example from the Using the evaluator docs fails due to unspecified tokenizer

Question

Execution of example from the Using the evaluator docs fails due to unspecified tokenizer

jpodivin opened this issue 6 months ago · 0 comments

Instead of calculating metrics, the first example of evaluation[1] fails since the tokenizer isn't provided nor inferred.

Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

To replicate, simply try to execute following:

from datasets import load_dataset
from evaluate import evaluator
from transformers import AutoModelForSequenceClassification, pipeline

data = load_dataset("imdb", split="test").shuffle(seed=42).select(range(1000))
task_evaluator = evaluator("text-classification")

# 1. Pass a model name or path
eval_results = task_evaluator.compute(
    model_or_pipeline="lvwerra/distilbert-imdb",
    data=data,
    label_mapping={"NEGATIVE": 0, "POSITIVE": 1}
)

# 2. Pass an instantiated model
model = AutoModelForSequenceClassification.from_pretrained("lvwerra/distilbert-imdb")

eval_results = task_evaluator.compute(
    model_or_pipeline=model,
    data=data,
    label_mapping={"NEGATIVE": 0, "POSITIVE": 1}
)

evaluate===0.4.1

[1]https://huggingface.co/docs/evaluate/base_evaluator