zetaalphavector/RAGElo

new line in answer breaks the json parsing

Closed this issue · 5 comments

I've tried using gpt4-o to evaluate an answer and the model has the tendancy to output formatted json which contains newlines. For example:
{\n "relevance": 0,\n "reasoning": "The document incorrectly states that Rio de Janeiro is the capital of Brazil, which is factually incorrect. The correct capital of Brazil is Brasília."\n}

I get this error:

('Failed to evaluate qid: <no_qid> did: <no_did>', 'Exception: Answer does not contain all necessary keys\nExpected [\'relevance\', \'reasoning\'], found set().\nFull Answer:\n{\n  "relevance": 0,\n  "reasoning": "The document incorrectly states that Rio de Janeiro is the capital of Brazil, which is false. The correct answer is Brasília."\n}')

This breaks the __parse_json function even when the json is valid

for line in answer.strip().split("\n"):

I guess this could be a prompting issue. I added (NON-BEAUTIFIED) to the example prompt and it works. I was confused because in the error I get, it's not clear what is wrong with the json

I have the same issue as above. I am not sure what exactly is meant by adding '(NON-BEAUTIFIED)' to the prompt . Could someone please advise and may be update the repo?

`>>> from ragelo import get_retrieval_evaluator

evaluator = get_retrieval_evaluator("RDNAM", llm_provider="openai")
raw_answer, processed_answer = evaluator.evaluate(query="What is the capital of France?", document='Lyon is the second largest city in France.')
Failed to PARSE answer for qid: <no_qid> document id: {document.did}
Raw answer: {"O": 0}
Traceback (most recent call last):
File "", line 1, in
File "/home/Ubuntu/miniconda3/envs/ragelo/lib/python3.11/site-packages/ragelo/evaluators/retrieval_evaluators/base_retrieval_evaluator.py", line 200, in evaluate
raise ValueError(
ValueError: ('Failed to evaluate qid: <no_qid> did: <no_did>', 'Exception: Failed to parse answer: {{"O": 0}')
`

@fahdmirza I'm not sure this is the same issue, because I was using a custom prompt instead of RDNAM. But here is my code if that can help:

from ragelo import get_retrieval_evaluator, get_llm_provider
from ragelo.types.configurations import OpenAIConfiguration

api_key = os.environ.get('OPENAI_API_KEY')
openai_config = OpenAIConfiguration(api_key=api_key, model="gpt-4o")
llm_provider = get_llm_provider(name="openai", config=openai_config)
prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.

The answer should be evaluated according tot its relevance to the user query.

User query: {q}

Retrieved document: {d}

WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT (NON-BEAUTIFIED) WITH THE FOLLOWING KEYS:
- "relevance": A score from 0 to 3 indicating the relevance of the document to the query.
    - 3 Perfectly relevant: The passage is dedicated to the query and contains the exact answer.
    - 2 Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.
    - 1 Related: The passage seems related to the query but does not answer it.
    - 0 Irrelevant: The passage has nothing to do with the query.
- "reasoning": A short explanation of why you think the document is relevant and why you gave the specific score.
"""

evaluator = get_retrieval_evaluator(
    "custom_prompt", # name of the retrieval evaluator
    llm_provider=llm_provider, # Which LLM provider to use
    prompt=prompt, # your custom prompt
    query_placeholder="q", # the placeholder for the query in the prompt
    document_placeholder="d", # the placeholder for the document in the prompt
    answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
    scoring_keys=["relevance", "reasoning"], # Which keys to extract from the answer
)

raw_answer, answer = evaluator.evaluate(
    query="What is the capital of Brazil?", # The user query
    document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
    query_metadata=None,
    doc_metadata=None
)

Without (NON-BEAUTIFIED) in the prompt, I get the error reported in the this issue.

I've just merged #32 that should help with the RDNAM evaluator. Let me know if the issue persists!

Thanks guys, I just reviewed it on my channel and also credited the help here:

https://youtu.be/b_l5aeLLxSE