[MIEB] image file is truncated
Muennighoff opened this issue · 1 comments
Muennighoff commented
Am on the latest of the mieb branch and trying to run python mteb/scripts/run_mieb.py
with only CLIP and getting the below 🤔
/env/lib/conda/gritkto4/lib/python3.10/site-packages/PIL/TiffImagePlugin.py:935: UserWarning: Truncated File Read
warnings.warn(str(msg))
ERROR:mteb.evaluation.MTEB:Error while evaluating Birdsnap: image file is truncated (45 bytes not processed)
Traceback (most recent call last):
File "/data/niklas/mieb/mteb/scripts/run_mieb.py", line 23, in <module>
results = evaluation.run(model, output_folder="results-mieb-final")
File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 422, in run
raise e
File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 383, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mieb/mteb/mteb/evaluation/MTEB.py", line 260, in _run_eval
results = task.evaluate(
File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskImageClassification.py", line 99, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskImageClassification.py", line 135, in _evaluate_subset
X_sampled, y_sampled, idxs = self._undersample_data(
File "/data/niklas/mieb/mteb/mteb/abstasks/Image/AbsTaskImageClassification.py", line 202, in _undersample_data
label = dataset_split[i][label_column_name]
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2762, in __getitem__
return self._getitem(key)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2747, in _getitem
formatted_output = format_table(
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 639, in format_table
return formatter(pa_table, query_type=query_type)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 403, in __call__
return self.format_row(pa_table)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 444, in format_row
row = self.python_features_decoder.decode_row(row)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 222, in decode_row
return self.features.decode_example(row) if self.features else row
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/features/features.py", line 2041, in decode_example
return {
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/features/features.py", line 2042, in <dictcomp>
column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/features/features.py", line 1403, in decode_nested_example
return schema.decode_example(obj, token_per_repo_id=token_per_repo_id)
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/datasets/features/image.py", line 188, in decode_example
image.load() # to avoid "Too many open files" errors
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/PIL/ImageFile.py", line 297, in load
raise OSError(msg)
OSError: image file is truncated (45 bytes not processed)
isaac-chung commented
Going to upload a downsampled version of the train split with about ~32 pics for each of the 500 bird species. cc @gowitheflow-1998