jina-ai/late-chunking

Error when running testing

Closed this issue · 5 comments

I run the command "python3 run_chunked_eval.py --task-name TRECCOVIDChunked".
Got the below errors, any suggestions?

ERROR:mteb.evaluation.MTEB:Error while evaluating TRECCOVIDChunked: Currently truncation is only implemented for documents without titles
Traceback (most recent call last):
File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 167, in
main()
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/tanyu/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/raid/tanyu/late-chunking/run_chunked_eval.py", line 129, in main
evaluation.run(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 422, in run
raise e
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 383, in run
results, tick, tock = self._run_eval(
File "/home/tanyu/.local/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 260, in _run_eval
results = task.evaluate(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 95, in evaluate
scores[hf_subset] = self._evaluate_monolingual(
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 162, in _evaluate_monolingual
corpus = self._truncate_documents(corpus)
File "/raid/tanyu/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 110, in _truncate_documents
raise NotImplementedError(
NotImplementedError: Currently truncation is only implemented for documents without titles

Sorry the last commit enabled truncation by default, however, the current code doesn't support truncation for benchmark datasets (like TRECCOVID) where the documents have a title and a text. For the experiments in the paper, we run TRECCOID without truncation. Nevertheless, it should work when you disable truncation by running:

python3 run_chunked_eval.py --task-name TRECCOVIDChunked --truncate-max-length 0

I will also make a PR ready to fix it.

Ok we merged this one:
#19
Now it should also work without the additional truncate-max-length argument

Thanks for your quick response! It works now @guenthermi

Another issue comes when I run "python3 run_chunked_eval.py --task-name SciFactChunked --truncate-max-length 0"

" File "/late-chunking/chunked_pooling/mteb_chunked_eval.py", line 254, in _evaluate_monolingual
max_k = int(max(k_values) / max_chunks)
~~~~~~~~~~~~~~^~~~~~~~~~~~
ZeroDivisionError: division by zero"

@guenthermi could you help check it? Thanks

Ok without --truncate-max-length 0 is should have worked fine on the last version, now I made a small commit to make truncate_max_length=0 equivalent to truncate_max_length=None. It seems like in some cases it really truncated to 0 tokens instead of disabling it.