Future-House/paper-qa

Parsing JSON error when increasing evidence_k with mixtral:8x7b

Closed this issue · 6 comments

Hello,

I paperQA returns a JSON parsing error when I increase evidence_k when I use mixtral:7xb ass LLM model.

Works fine with these parameters :

settings.answer.answer_max_sources = 5 
settings.answer.evidence_k = 10  

Doesn't works with these parameters :

settings.answer.answer_max_sources = 5 
settings.answer.evidence_k = 20  

Returns this error :

    return json.loads(ptext)
           ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Unterminated string starting at: line 2 column 14 (char 15)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\XXXXXXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 126, in <module>
    my_query()
  File "C:\Users\XXXXXXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 104, in my_query
    answer = docs.query(
             ^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 665, in query
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 705, in aquery
    session = await self.aget_evidence(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 632, in aget_evidence
    results = await gather_with_concurrency(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\utils.py", line 114, in gather_with_concurrency
    return await asyncio.gather(*(sem_coro(c) for c in coros))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\utils.py", line 112, in sem_coro
    return await coro
           ^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\core.py", line 78, in map_fxn_summary
    result_data = parser(context) if parser else {}
                  ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\core.py", line 31, in llm_parse_json
    raise ValueError(
ValueError: Failed to parse JSON from text ' {\n  "summary": "The excerpt provides information about a study of silver coins from the '. Your model may not be capable of supporting JSON output or our parsing technique could use some work. Try a different model or specify `Settings(prompts={'use_json': False})`

Works fine, with llama3.1:70b with these parameters :

settings.answer.answer_max_sources = 5 
settings.answer.evidence_k = 20  

Does anyone have an idea about this issue ?

@dosu.

The JSON parsing error you're encountering when increasing evidence_k with the mixtral:8x7b model is likely due to the model's output not being properly formatted as JSON. This can happen if the model generates incomplete or malformed JSON, especially when handling larger outputs as indicated by a higher evidence_k value.

Here are a few things you can try to address this issue:

  1. Check JSON Formatting: Ensure that the JSON output from the model is complete and properly formatted. The error message suggests an "Unterminated string," which means the JSON string might be cut off or improperly closed.

  2. Adjust evidence_k: Since the issue arises with a higher evidence_k, try incrementally increasing the value to find a threshold that works without causing errors. This might help identify a limit where the model's output becomes problematic.

  3. Use a Different Model: As you noted, the llama3.1:70b model works fine with the same parameters. If possible, consider using this model or another one that handles JSON output more reliably.

  4. Modify JSON Parsing: You might need to adjust the JSON parsing logic to handle incomplete or malformed JSON more gracefully. This could involve adding error handling or pre-processing steps to clean up the model's output before attempting to parse it.

  5. Disable JSON Output: If JSON output is not strictly necessary, you can disable it by setting Settings(prompts={'use_json': False}), as suggested in the error message. This will bypass JSON parsing and might help avoid the issue.

These steps should help mitigate the JSON parsing errors you're experiencing with the mixtral:8x7b model when using a higher evidence_k value.

To continue talking to Dosu, mention @dosu.

Hi @Snikch63200, looking at the stack:

ValueError: Failed to parse JSON from text ' {\n "summary": "The excerpt provides information about a study of silver coins from the '. Your model may not be capable of supporting JSON output or our parsing technique could use some work. Try a different model or specify Settings(prompts={'use_json': False})

Specifically:

from text ' {\n "summary": "The excerpt provides information about a study of silver coins from the '

You can see the model seems to have truncated the JSON, there is no closing brace }. Perhaps your max_tokens was too low, or mixtral:8x7b isn't able to produce JSON reliably

Hi @jamesbraza,

Thanks for your answer. I guess you're right, it's a problem related to max_token number. Theorically, I read mixtral models are able to produce correct JSON.

My llm model is setted up as follows :

local_llm_config = dict(
                        model_list=[
                                    dict(
                                        model_name=f"ollama/mixtral:8x7b",
                                        litellm_params=dict(
                                                            model=f"ollama/mixtral:8x7b",
                                                            api_type= "ollama",
                                                            api_base="http://192.168.1.15:11434",
                                                            cache= True,
                                                            mirostat=0,
                                                            mirostat_eta=0.1,
                                                            mirostat_tau=5.0,
                                                            num_ctx=32768, #Default 2048
                                                            repeat_last_n=64, # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
                                                            repeat_penalty=1.1,
                                                            temperature=0.7,
                                                            seed=0,
                                                            tfs_z=1,
                                                            num_predict=-1,# Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
                                                            top_k=40,
                                                            top_p=0.9,
                                                            min_p=0.0
                                                            ),
                                        ),

                                    ]
                        )

num_predict seems to be the argument to set up max_tokens in Ollama API. Also tried '4096' (max value for mixtral:8x7b), 2048, 1024 without success...

Have you an idea about this ?

Best regards.

@Snikch63200 yeah if you look at the error message:

Try a different model or specify Settings(prompts={'use_json': False})

You can try this out, basically it moves "evidence" from JSON to just text blobs

Hello,

I tried to specify Settings(prompts={'use_json': False}) but it gives poor results.

But I solved this issue (that's not a paperqa issue).

It seems Llama3.1 accepts -1 or -2 as num_predict value (-1 = infinite token generation, -2 context len token generation) while mixtral:8x7b doesn't. So I set up num_predict to 1E6 and it work fine.

Thanks for you help.

PS : Maybe it's good idea to open a discussion about LLMs optimal configuration settings for PaperQA. I'll try to get time to write a short synthesis abour my own experience with several models.

Thanks for sharing your resolution, appreciated.

PS : Maybe it's good idea to open a discussion about LLMs optimal configuration settings for PaperQA. I'll try to get time to write a short synthesis abour my own experience with several models.

We do have a couple pre-made settings in paperqa/configs, feel free to open an issue requesting changes to one or more.

Also, I expanded #749 to get retrying cover this case as well, I am going to close this out, thanks for the report