Sinaptik-AI/pandas-ai

Unable To Generate Schema: InvalidLLMOutputType (Response validation failed!)

mayankagarwal19911 opened this issue ยท 4 comments

System Info

pandasai 2.2.8
Python 3.9.6

๐Ÿ› Describe the bug

Sample code

df = pd.DataFrame(columns=["column1"], data=[["value1"]])
df.head()
semantic_agent = SemanticAgent(dfs=df)
print(semantic_agent._schema)

Error

in BaseAgent.call_llm_with_prompt(self, prompt)
return result
else:
raise InvalidLLMOutputType("Response validation failed!")
except Exception:
if (
not self.context.config.use_error_correction_framework
or retry_count >= self.context.config.max_retries - 1
):

Hello, @mayankagarwal19911! I'm here to assist you with any bugs, questions, or contributions you may have. Let's tackle this issue together!

To resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema using pandasai 2.2.8 and Python 3.9.6, follow these steps:

  1. Ensure the Template Correctly Specifies the Expected Output Type:
    Modify the template in pandasai/prompts/templates/correct_output_type_error_prompt.tmpl to ensure it specifies the expected output type.

    {% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}
    
    The user asked the following question:
    {{context.memory.get_conversation()}}
    
    You generated this python code:
    {{code}}
    
    Fix the python code above and return the new python code but the result type should be: {{output_type}}
    
  2. Set the Correct Output Type in Your Context:
    Ensure that the output_type variable is correctly set to the expected type in your context. For example, if you expect a DataFrame, output_type should be set to DataFrame.

    from jinja2 import Template
    
    template_str = """
    {% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}
    
    The user asked the following question:
    {{context.memory.get_conversation()}}
    
    You generated this python code:
    {{code}}
    
    Fix the python code above and return the new python code but the result type should be: {{output_type}}
    """
    
    context = {
        'dfs': [df],  # Your DataFrame
        'memory': memory,  # Your conversation memory
        'code': generated_code,  # The generated Python code
        'output_type': 'DataFrame'  # The expected output type
    }
    
    template = Template(template_str)
    corrected_code = template.render(context=context)
    
    print(corrected_code)
  3. Handle Validation in call_llm_with_prompt Method:
    Ensure that the call_llm_with_prompt method in your BaseAgent class correctly handles the validation of the output type. The method should retry the prompt if the validation fails.

    def call_llm_with_prompt(self, prompt: BasePrompt):
        retry_count = 0
        while retry_count < self.context.config.max_retries:
            try:
                result: str = self.context.config.llm.call(prompt)
                if prompt.validate(result):
                    return result
                else:
                    raise InvalidLLMOutputType("Response validation failed!")
            except Exception:
                if (
                    not self.context.config.use_error_correction_framework
                    or retry_count >= self.context.config.max_retries - 1
                ):
                    raise
                retry_count += 1
  4. Generate and Validate the Schema:
    Ensure that the schema generation and validation are correctly implemented in the SemanticAgent class.

    def _create_schema(self):
        if self._schema:
            self.logger.log(f"using user provided schema: {self._schema}")
            return
    
        key = self._get_schema_cache_key()
        if self.config.enable_cache:
            value = self._schema_cache.get(key)
            if value is not None:
                self._schema = json.loads(value)
                self.logger.log(f"using schema: {self._schema}")
                return
    
        prompt = GenerateDFSchemaPrompt(context=self.context)
        result = self.call_llm_with_prompt(prompt)
        self.logger.log(f"Initializing Schema:  {result}")
        self._schema = result.replace("# SAMPLE SCHEMA", "")
        schema_data = extract_json_from_json_str(result.replace("# SAMPLE SCHEMA", ""))
        if isinstance(schema_data, dict):
            schema_data = [schema_data]
    
        self._schema = schema_data
        if self.config.enable_cache:
            self._schema_cache.set(key, json.dumps(self._schema))
    
        self.logger.log(f"using schema: {self._schema}")
  5. Ensure Correct Output Type Validation:
    Ensure that your generated code produces a result dictionary with the correct type and value fields that match the expected output type.

    class DataFrameOutputType(BaseOutputType):
        @property
        def template_hint(self):
            return """type (must be "dataframe"), value must be pd.DataFrame or pd.Series. Example: { "type": "dataframe", "value": pd.DataFrame({...}) }"""
    
        @property
        def name(self):
            return "dataframe"
    
        def _validate_value(self, actual_value: Any) -> bool:
            return bool(df_type(actual_value))

By following these steps, you should be able to resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema [1][2][3][4].

To continue talking to Dosu, mention @dosu.

#1265 Will like to solve this bug

@pratik9409 sure, thanks a lot for the availability, I've assigned it to you! :D

Testing with a simple DataFrame

df = pd.DataFrame(columns=["Empdata"], data=[[1], [2]])

df.head()

try:
# Create an instance of the SemanticAgent with the provided dataframe
semantic_agent = SemanticAgent(dfs=df)
# Print the generated schema
print(semantic_agent._schema)
except InvalidLLMOutputType as e:
# If the LLM fails to generate a valid schema, catch the InvalidLLMOutputType exception
print(f"Error: {e}") # Print the error message
print("Using fallback schema...") # Inform the user that a fallback schema will be used

semanticouput