Unable To Generate Schema: InvalidLLMOutputType (Response validation failed!)

Question

Unable To Generate Schema: InvalidLLMOutputType (Response validation failed!)

mayankagarwal19911 opened this issue 3 months ago · 4 comments

mayankagarwal19911 commented 3 months ago

System Info

pandasai 2.2.8
Python 3.9.6

🐛 Describe the bug

Sample code

df = pd.DataFrame(columns=["column1"], data=[["value1"]])
df.head()
semantic_agent = SemanticAgent(dfs=df)
print(semantic_agent._schema)

Error

in BaseAgent.call_llm_with_prompt(self, prompt)
return result
else:
raise InvalidLLMOutputType("Response validation failed!")
except Exception:
if (
not self.context.config.use_error_correction_framework
or retry_count >= self.context.config.max_retries - 1
):

Answer 1 · 2024-07-02T19:13:27.000Z

Hello, @mayankagarwal19911! I'm here to assist you with any bugs, questions, or contributions you may have. Let's tackle this issue together!

To resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema using pandasai 2.2.8 and Python 3.9.6, follow these steps:

Ensure the Template Correctly Specifies the Expected Output Type:
Modify the template in pandasai/prompts/templates/correct_output_type_error_prompt.tmpl to ensure it specifies the expected output type.

{% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}

The user asked the following question:
{{context.memory.get_conversation()}}

You generated this python code:
{{code}}

Fix the python code above and return the new python code but the result type should be: {{output_type}}

Set the Correct Output Type in Your Context:
Ensure that the output_type variable is correctly set to the expected type in your context. For example, if you expect a DataFrame, output_type should be set to DataFrame.

from jinja2 import Template

template_str = """
{% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}

The user asked the following question:
{{context.memory.get_conversation()}}

You generated this python code:
{{code}}

Fix the python code above and return the new python code but the result type should be: {{output_type}}
"""

context = {
    'dfs': [df],  # Your DataFrame
    'memory': memory,  # Your conversation memory
    'code': generated_code,  # The generated Python code
    'output_type': 'DataFrame'  # The expected output type
}

template = Template(template_str)
corrected_code = template.render(context=context)

print(corrected_code)

Handle Validation in call_llm_with_prompt Method:
Ensure that the call_llm_with_prompt method in your BaseAgent class correctly handles the validation of the output type. The method should retry the prompt if the validation fails.

def call_llm_with_prompt(self, prompt: BasePrompt):
    retry_count = 0
    while retry_count < self.context.config.max_retries:
        try:
            result: str = self.context.config.llm.call(prompt)
            if prompt.validate(result):
                return result
            else:
                raise InvalidLLMOutputType("Response validation failed!")
        except Exception:
            if (
                not self.context.config.use_error_correction_framework
                or retry_count >= self.context.config.max_retries - 1
            ):
                raise
            retry_count += 1

Generate and Validate the Schema:
Ensure that the schema generation and validation are correctly implemented in the SemanticAgent class.

def _create_schema(self):
    if self._schema:
        self.logger.log(f"using user provided schema: {self._schema}")
        return

    key = self._get_schema_cache_key()
    if self.config.enable_cache:
        value = self._schema_cache.get(key)
        if value is not None:
            self._schema = json.loads(value)
            self.logger.log(f"using schema: {self._schema}")
            return

    prompt = GenerateDFSchemaPrompt(context=self.context)
    result = self.call_llm_with_prompt(prompt)
    self.logger.log(f"Initializing Schema:  {result}")
    self._schema = result.replace("# SAMPLE SCHEMA", "")
    schema_data = extract_json_from_json_str(result.replace("# SAMPLE SCHEMA", ""))
    if isinstance(schema_data, dict):
        schema_data = [schema_data]

    self._schema = schema_data
    if self.config.enable_cache:
        self._schema_cache.set(key, json.dumps(self._schema))

    self.logger.log(f"using schema: {self._schema}")

Ensure Correct Output Type Validation:
Ensure that your generated code produces a result dictionary with the correct type and value fields that match the expected output type.

class DataFrameOutputType(BaseOutputType):
    @property
    def template_hint(self):
        return """type (must be "dataframe"), value must be pd.DataFrame or pd.Series. Example: { "type": "dataframe", "value": pd.DataFrame({...}) }"""

    @property
    def name(self):
        return "dataframe"

    def _validate_value(self, actual_value: Any) -> bool:
        return bool(df_type(actual_value))

By following these steps, you should be able to resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

Answer 2 · 2024-07-09T18:15:46.000Z

#1265 Will like to solve this bug

Answer 3 · 2024-07-09T19:21:39.000Z

@pratik9409 sure, thanks a lot for the availability, I've assigned it to you! :D

Answer 4 · 2024-07-15T12:16:08.000Z

Testing with a simple DataFrame

df = pd.DataFrame(columns=["Empdata"], data=[[1], [2]])

df.head()

try:
# Create an instance of the SemanticAgent with the provided dataframe
semantic_agent = SemanticAgent(dfs=df)
# Print the generated schema
print(semantic_agent._schema)
except InvalidLLMOutputType as e:
# If the LLM fails to generate a valid schema, catch the InvalidLLMOutputType exception
print(f"Error: {e}") # Print the error message
print("Using fallback schema...") # Inform the user that a fallback schema will be used