Future-House/paper-qa

Pydantic error when parsing context

Closed this issue · 4 comments

Hello, sometimes i have this error while using paper-qa :

   File "C:\Users\XXXXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 129, in <module>

    my_query()
  File "C:\Users\XXXXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 107, in my_query
    answer = docs.query(
             ^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 665, in query
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 705, in aquery
    session = await self.aget_evidence(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 632, in aget_evidence
    results = await gather_with_concurrency(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\utils.py", line 114, in gather_with_concurrency
    return await asyncio.gather(*(sem_coro(c) for c in coros))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\utils.py", line 112, in sem_coro
    return await coro
           ^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\core.py", line 106, in map_fxn_summary
    Context(
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\pydantic\main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Context
score
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='7.5', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/int_parsing

Seems it's ralated to pydantic while parsing context.

How can I fix this ?

@dosu.

The error you're encountering is due to Pydantic expecting an integer for the score field in the Context model, but it is receiving a string that represents a float ('7.5'). The score field is defined as an integer in the Context model [1].

To fix this issue, you have a couple of options:

  1. Ensure Integer Values: Make sure that the score values being passed to the Context model are integers. If the score is calculated or retrieved from a source that might produce a float, you should convert it to an integer before passing it to the Context model. For example, you can use int(float_value) to convert a float to an integer.

  2. Modify the Model (if appropriate): If it makes sense for your application to handle float scores, you could modify the Context model to accept floats instead of integers. This would involve changing the type of the score field in the model definition from int to float. However, this change should be carefully considered, as it might affect other parts of the application that rely on the score being an integer.

If you choose the first option, ensure that any part of your code that assigns a score to the Context model performs the necessary conversion to an integer. If you need further assistance with implementing these changes, feel free to ask!

To continue talking to Dosu, mention @dosu.

How can I ensure my score values are integer ?

Should I modify paperqa code ?

@dosu.

To ensure your score values are integers without modifying the PaperQA code, you can use the extract_score function provided in the paperqa/utils.py file. This function is designed to extract and return integer score values from a given text. Here's a brief overview of how it works:

The function parses the text to find score values and converts them to integers, ensuring that the scores are within a reasonable range (e.g., converting scores out of 100 to scores out of 10). You can use this function directly in your code to process text and extract integer scores [1].

To continue talking to Dosu, mention @dosu.

Hello @Snikch63200 this is a case of an LLM not listening to us. It's supposed to give an integer score in 1-10, but here we see it gave a score of 7.5.

I took two actions here:

  1. Adjusted the internal LLM prompt to try to better avoid this in #748
  2. Opened #749 for a longer-term fix of re-prompting the LLM