AkariAsai/self-rag

FactScore Inference Fails with KeyError: 'original_splitted_sentences'

hideaki-j opened this issue · 2 comments

Hello, thanks for your amazing work!

I want to ask questions about an error KeyError: 'original_splitted_sentences' that I encountered when trying to generate results for FactScore.

Error

When I run run_long_form_static.py for FactScore following the command shown in "Run inference using pre-retrieved passages" in README.md, I encounter:

KeyError: 'original_splitted_sentences'

The error originates from the following line:

"cat": item["cat"], "intermediate": intermediate["original_splitted_sentences"][0]})

This error seems to be the same as issue #76 . However, since that issue was retracted, I am reposting it here.

Culprit?

The error occurs when do_retrieve == False, and the culprit seems to be:

    if do_retrieve is False:
        ...
        prediction_tree = {}
        return preds[0], prediction_tree

at here, since it always return prediction_tree = {}, resulting in KeyError

Another issue: always no retrieval

Upon investigating this error, I also found that no retrieval occurs unless using --mode always_retrieve (i.e., do_retrieve is always False even when using adaptive_retrieval or default). Therefore, when I run run_long_form_static.py with the same flags specified in README.md, it always goes to the if do_retrieve is False path, causing the above error.

Adding the --mode always_retrieve flag solves the error, but I'm not sure if it was accidentally omitted from the instruction command.

Also, I am not sure that always being do_retrieve == False is an expected behaviour here - it seems not to be.

Questions

Q1. Is the --mode always_retrieve flag missing from the command instructions for FactScore, or is the command correct and the cause of the error lies elsewhere?

Q2. With mode == "adaptive_retrieval” and mode == "default”, it appears to always go to do_retrieve == False, but is this expected behavior?

Thanks!

Adding the --mode always_retrieve flag solves the error, but I'm not sure if it was accidentally omitted from the instruction command.

I believe this is the case. 🍻

answer1:
I encountered numerous difficulties during the evaluation of factscore as well. The Selfrag repository only provided scripts for the always retrieval mode. I personally evaluated always retrieval, adaptive retrieval, and no retrieval and found that adaptive retrieval and no retrieval produced the same errors as you did. I spent some time resolving this issue, and you can refer to my scripts for evaluating factscore.
https://github.com/fate-ubw/RAGLAB/blob/main/run/Factscore/2-eval_fact-raglab-selfrag-selfrag_8B-adaptive_retrieval-GPT.sh
answer2:
I encountered the same problem as you, where there are discrepancies in the logic of longform compared to the selfrag paper, and the code related to the construction of prediction_tree = {} is difficult to understand. I have rewritten the code for the three modes of selfrag longform (always retrieval, adaptive retrieval, no retrieval) in a clearer manner, which you can refer to for understanding the reasoning process of selfrag longform.
https://github.com/fate-ubw/RAGLAB/blob/main/raglab/rag/infer_alg/self_rag_reproduction/selfrag_reproduction.py#L216