Sinaptik-AI/pandas-ai

Support for Figure object output type

Opened this issue ยท 2 comments

๐Ÿš€ The feature

I would love to have the ability of setting the ouput type to a plotly or matplotlib figure, instead of saving plots to PNG and returning filepaths, where the usefulness is quite limited.

Motivation, pitch

I recently started using pandasai for building custom data analysis apps and I like it quite a lot so far. I was wondering why it is limited to the four output datatypes and even more why the (cumbersome) way of saving images to disk and returning a filepath has been chosen. Maybe it has a security related reason or it is due to the client-server architecture of pandasai? Instead returning objects (like already implemented for the dataframe) would open much more potential, especially regarding plots and figures.

I have already tinkered with modifying the output_type_template.tmpl and output_validator.py in order to make pandasai return figure objects. However I do not really know about potential problems/implications of this and thus am proposing this here as a feature request, since my "hacky" implementation is probably not how it should be implemented.

Here you can see the resulting prompt used and the generated code, which works fine for now in a standalone app.
Prompt used:

<dataframe>
dfs[0]:150x5
Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class
7.2,3.4,6.4,0.3,Iris-setosa
4.5,4.1,3.5,1.7,Iris-virginica
6.0,2.6,5.9,2.5,Iris-versicolor
</dataframe>


Update this initial code:
"""python
# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var:
type (must be "figure"), value must be a matplotlib.figure or plotly.graph_objects.Figure. Example: { "type": "figure", "value": go.Figure(...) }   

"""



### QUERY
 Plot the sepal length and width of the data and color points by class

Variable `dfs: list[pd.DataFrame]` is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "plotly" for charts, save as png.


Generate python code and return full updated code:

Resulting Code:

df = dfs[0]
fig = px.scatter(df, x='Sepal_Length', y='Sepal_Width', color='Class', title='Sepal Length vs Sepal Width', labels={'Sepal_Length': 'Sepal Length', 'Sepal_Width': 'Sepal Width'})
result = {'type': 'figure', 'value': fig}

Alternatives

I know its also possible to convert plotly figures from/to json. So maybe this could be another option to return (or potentially also save) the figure as json instead.

Additional context

Final Result in Chatbot App:
image

@Blubbaa May I ask how did you modify the output_type_template.tmpl and the output_validator.py in order to make pandasai return figure objects?

@at-eez-jedi yes surely. I have modified output_type_template.tmpl like this:

{% if not output_type %}
type (possible values "string", "number", "dataframe", "plot", "figure"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" } or { "type": "figure", "value": go.Figure(...) }
{% elif output_type == "number" %}
type (must be "number"), value must int. Example: { "type": "number", "value": 125 }
{% elif output_type == "string" %}
type (must be "string"), value must be string. Example: { "type": "string", "value": f"The highest salary is {highest_salary}." }
{% elif output_type == "dataframe" %}
type (must be "dataframe"), value must be pd.DataFrame or pd.Series. Example: { "type": "dataframe", "value": pd.DataFrame({...}) }
{% elif output_type == "plot" %}
type (must be "plot"), value must be string. Example: { "type": "plot", "value": "temp_chart.png" }
{% elif output_type == "figure" %}
type (must be "figure"), value must be a matplotlib.figure or plotly.graph_objects.Figure. Example: { "type": "figure", "value": go.Figure(...) }
{% endif %}

I also inserted this part in generate_python_code.tmpl, to deal with the save as PNG instruction:

At the end, declare "result" variable as a dictionary of type and value.
{% if viz_lib %}
If you are asked to plot a chart, use "{{viz_lib}}" for charts.
{% endif %}
{% if output_type == "plot" %}
Save charts as PNG.
{% endif %}
{% if output_type == "figure" %}
Do not save the figure to file.
{% endif %}

And output_validator.py:

def validate_value(self, expected_type: str) -> bool:
        if not expected_type:
            return True
        elif expected_type == "number":
            return isinstance(self, (int, float))
        elif expected_type == "string":
            return isinstance(self, str)
        elif expected_type == "dataframe":
            return isinstance(self, (pd.DataFrame, pd.Series))
        elif expected_type == "plot":
            if not isinstance(self, (str, dict)):
                return False

            if isinstance(self, dict):
                return True

            path_to_plot_pattern = r"^(\/[\w.-]+)+(/[\w.-]+)*$|^[^\s/]+(/[\w.-]+)*$"
            return bool(re.match(path_to_plot_pattern, self))
        elif expected_type == "figure":
            return "plotly.graph_objs._figure.Figure" in repr(type(self)) or "matplotlib.figure.Figure" in repr(type(self))

    @staticmethod
    def validate_result(result: dict) -> bool:
        if not isinstance(result, dict) or "type" not in result:
            raise InvalidOutputValueMismatch(
                "Result must be in the format of dictionary of type and value"
            )

        if not result["type"]:
            return False

        elif result["type"] == "number":
            return isinstance(result["value"], (int, float, np.int64))
        elif result["type"] == "string":
            return isinstance(result["value"], str)
        elif result["type"] == "dataframe":
            return isinstance(result["value"], (pd.DataFrame, pd.Series))
        elif result["type"] == "plot":
            if "plotly" in repr(type(result["value"])):
                return True

            if not isinstance(result["value"], (str, dict)):
                return False

            if isinstance(result["value"], dict) or (
                isinstance(result["value"], str)
                and "data:image/png;base64" in result["value"]
            ):
                return True

            path_to_plot_pattern = r"^(\/[\w.-]+)+(/[\w.-]+)*$|^[^\s/]+(/[\w.-]+)*$"
            return bool(re.match(path_to_plot_pattern, result["value"]))
        elif result["type"] == "figure":
            return "plotly.graph_objs._figure.Figure" in repr(type(result["value"])) or "matplotlib.figure.Figure" in repr(type(result["value"]))