Error when running testing

Question

Error when running testing

Closed this issue 2 months ago · 5 comments

I run the command "python3 run_chunked_eval.py --task-name TRECCOVIDChunked".
Got the below errors, any suggestions?

ERROR:mteb.evaluation.MTEB:Error while evaluating TRECCOVIDChunked: Currently truncation is only implemented for documents without titles
Traceback (most recent call last):
File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 167, in
main()
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 129, in main
evaluation.run(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 422, in run
raise e
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 383, in run
results, tick, tock = self._run_eval(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 260, in _run_eval
results = task.evaluate(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 95, in evaluate
scores[hf_subset] = self._evaluate_monolingual(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 162, in _evaluate_monolingual
corpus = self._truncate_documents(corpus)
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 110, in _truncate_documents
raise NotImplementedError(
NotImplementedError: Currently truncation is only implemented for documents without titles

Answer 1 · 2024-10-07T08:21:17.000Z

Sorry the last commit enabled truncation by default, however, the current code doesn't support truncation for benchmark datasets (like TRECCOVID) where the documents have a title and a text. For the experiments in the paper, we run TRECCOID without truncation. Nevertheless, it should work when you disable truncation by running:

python3 run_chunked_eval.py --task-name TRECCOVIDChunked --truncate-max-length 0

I will also make a PR ready to fix it.

Answer 2 · 2024-10-07T09:53:50.000Z

Ok we merged this one:
#19
Now it should also work without the additional truncate-max-length argument

Answer 3 · 2024-10-08T00:20:09.000Z

Thanks for your quick response! It works now @guenthermi

Answer 4 · 2024-10-08T03:32:48.000Z

Another issue comes when I run "python3 run_chunked_eval.py --task-name SciFactChunked --truncate-max-length 0"

" File "/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 254, in _evaluate_monolingual
max_k = int(max(k_values) / max_chunks)
~~~~~~~~~~~~~~^~~~~~~~~~~~
ZeroDivisionError: division by zero"

@guenthermi could you help check it? Thanks

Answer 5 · 2024-10-08T07:31:14.000Z

Ok without --truncate-max-length 0 is should have worked fine on the last version, now I made a small commit to make truncate_max_length=0 equivalent to truncate_max_length=None. It seems like in some cases it really truncated to 0 tokens instead of disabling it.