new line in answer breaks the json parsing
Closed this issue · 5 comments
I've tried using gpt4-o to evaluate an answer and the model has the tendancy to output formatted json which contains newlines. For example:
{\n "relevance": 0,\n "reasoning": "The document incorrectly states that Rio de Janeiro is the capital of Brazil, which is factually incorrect. The correct capital of Brazil is Brasília."\n}
I get this error:
('Failed to evaluate qid: <no_qid> did: <no_did>', 'Exception: Answer does not contain all necessary keys\nExpected [\'relevance\', \'reasoning\'], found set().\nFull Answer:\n{\n "relevance": 0,\n "reasoning": "The document incorrectly states that Rio de Janeiro is the capital of Brazil, which is false. The correct answer is Brasília."\n}')
This breaks the __parse_json
function even when the json is valid
I guess this could be a prompting issue. I added (NON-BEAUTIFIED)
to the example prompt and it works. I was confused because in the error I get, it's not clear what is wrong with the json
I have the same issue as above. I am not sure what exactly is meant by adding '(NON-BEAUTIFIED)' to the prompt . Could someone please advise and may be update the repo?
`>>> from ragelo import get_retrieval_evaluator
evaluator = get_retrieval_evaluator("RDNAM", llm_provider="openai")
raw_answer, processed_answer = evaluator.evaluate(query="What is the capital of France?", document='Lyon is the second largest city in France.')
Failed to PARSE answer for qid: <no_qid> document id: {document.did}
Raw answer: {"O": 0}
Traceback (most recent call last):
File "", line 1, in
File "/home/Ubuntu/miniconda3/envs/ragelo/lib/python3.11/site-packages/ragelo/evaluators/retrieval_evaluators/base_retrieval_evaluator.py", line 200, in evaluate
raise ValueError(
ValueError: ('Failed to evaluate qid: <no_qid> did: <no_did>', 'Exception: Failed to parse answer: {{"O": 0}')
`
@fahdmirza I'm not sure this is the same issue, because I was using a custom prompt instead of RDNAM
. But here is my code if that can help:
from ragelo import get_retrieval_evaluator, get_llm_provider
from ragelo.types.configurations import OpenAIConfiguration
api_key = os.environ.get('OPENAI_API_KEY')
openai_config = OpenAIConfiguration(api_key=api_key, model="gpt-4o")
llm_provider = get_llm_provider(name="openai", config=openai_config)
prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
The answer should be evaluated according tot its relevance to the user query.
User query: {q}
Retrieved document: {d}
WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT (NON-BEAUTIFIED) WITH THE FOLLOWING KEYS:
- "relevance": A score from 0 to 3 indicating the relevance of the document to the query.
- 3 Perfectly relevant: The passage is dedicated to the query and contains the exact answer.
- 2 Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.
- 1 Related: The passage seems related to the query but does not answer it.
- 0 Irrelevant: The passage has nothing to do with the query.
- "reasoning": A short explanation of why you think the document is relevant and why you gave the specific score.
"""
evaluator = get_retrieval_evaluator(
"custom_prompt", # name of the retrieval evaluator
llm_provider=llm_provider, # Which LLM provider to use
prompt=prompt, # your custom prompt
query_placeholder="q", # the placeholder for the query in the prompt
document_placeholder="d", # the placeholder for the document in the prompt
answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
scoring_keys=["relevance", "reasoning"], # Which keys to extract from the answer
)
raw_answer, answer = evaluator.evaluate(
query="What is the capital of Brazil?", # The user query
document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
query_metadata=None,
doc_metadata=None
)
Without (NON-BEAUTIFIED)
in the prompt, I get the error reported in the this issue.
I've just merged #32 that should help with the RDNAM evaluator. Let me know if the issue persists!
Thanks guys, I just reviewed it on my channel and also credited the help here: