Skill name is not defined
WojtAcht opened this issue ยท 1 comments
System Info
OS version: macOS 14.5
Python version: Python 3.10.7
The current version of pandasai being used: 2.2.12
๐ Describe the bug
Bug: Skill Calculations Fail in PandasAI
Issue Description
Skills that perform calculations are failing with a NameError: name '<skill>' is not defined
error. This occurs because the _extract_fix_dataframe_redeclarations
method executes code in an environment that lacks skill definitions.
Root Cause
The _extract_fix_dataframe_redeclarations
method uses an environment created by get_environment()
, which does not include skill definitions:
def _extract_fix_dataframe_redeclarations(
self, node: ast.AST, code_lines: list[str]
) -> ast.AST:
# ...
code = "\n".join(code_lines)
env = get_environment(self._additional_dependencies)
env["dfs"] = copy.deepcopy(self._get_originals(self._dfs))
exec(code, env)
# ...
The get_environment()
function returns a dictionary with pandas, matplotlib, numpy, and some whitelisted builtins, but no skills:
def get_environment(additional_deps: List[dict]) -> dict:
return {
"pd": pd,
"plt": plt,
"np": np,
# Additional dependencies and whitelisted builtins...
}
Contrast with Correct Implementation
In contrast, the execute_code
method in the CodeExecution
class correctly adds skills to the environment:
def execute_code(self, code: str, context: ExecutionContext):
# ...
if context.skills_manager.used_skills:
for skill_func_name in context.skills_manager.used_skills:
skill = context.skills_manager.get_skill_by_func_name(skill_func_name)
environment[skill_func_name] = skill
# ...
Proposed Solution
To fix this issue, the _extract_fix_dataframe_redeclarations
method should be updated to include skill definitions in its execution environment, similar to the execute_code
method.
Example
import os
import pandas as pd
from pandasai import Agent
from pandasai.skills import skill
from pandasai.llm import OpenAI
employees_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Name": ["John", "Emma", "Liam", "Olivia", "William"],
"Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
}
salaries_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Salary": [5000, 6000, 4500, 7000, 5500],
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
# Add function docstring to give more context to model
@skill
def plot_salaries(names: list[str], salaries: list[int]):
"""
Displays the bar chart having name on x-axis and salaries on y-axis using matplotlib
Args:
names (list[str]): Employees' names
salaries (list[int]): Salaries
"""
import matplotlib.pyplot as plt
plt.bar(names, salaries)
plt.xlabel("Employee Name")
plt.ylabel("Salary")
plt.title("Employee Salaries")
plt.xticks(rotation=45)
@skill
def calculate_salary_betas(salaries: list[int]) -> list[float]:
"""
Calculates the betas (25th, 50th and 75th percentiles) of salaries.
Args:
salaries (list[int]): List of employee salaries
Returns:
list[float]: A list containing the 25th, 50th, and 75th percentiles
"""
import numpy as np
percentiles = np.percentile(salaries, [25, 50, 75])
return percentiles.tolist()
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
llm = OpenAI(
api_token=os.getenv("OPENAI_API_KEY"), temperature=0, seed=26, model="gpt-4o"
)
agent = Agent(
[employees_df, salaries_df],
config={"llm": llm, "enforce_privacy": True},
memory_size=10,
)
agent.add_skills(plot_salaries, calculate_salary_betas)
# Chat with the agent
response = agent.chat("Create a table with salary betas")
Error:
Traceback (most recent call last):
File "pandas-ai/pandasai/pipelines/chat/code_cleaning.py", line 95, in execute
code_to_run = self.get_code_to_run(input, code_context)
File "pandas-ai/pandasai/pipelines/chat/code_cleaning.py", line 152, in get_code_to_run
code_to_run = self._clean_code(code, context)
File "pandas-ai/pandasai/pipelines/chat/code_cleaning.py", line 515, in _clean_code
self._extract_fix_dataframe_redeclarations(node, clean_code_lines)
File "pandas-ai/pandasai/pipelines/chat/code_cleaning.py", line 420, in _extract_fix_dataframe_redeclarations
exec(code, env)
File "<string>", line 5, in <module>
NameError: name 'calculate_salary_betas' is not defined
Hey @WojtAcht! I'm here to help you out with any issues you're facing. Let's tackle this bug together and get it resolved!
To resolve the NameError: name 'calculate_salary_betas' is not defined
error, you need to update the _extract_fix_dataframe_redeclarations
method to include skill definitions in its execution environment. Here is the updated method:
def _extract_fix_dataframe_redeclarations(
self, node: ast.AST, code_lines: list[str]
) -> ast.AST:
if isinstance(node, ast.Assign):
target_names, is_slice, target = self._get_target_names(node.targets)
if target_names and self._check_is_df_declaration(node):
# Construct dataframe from node
code = "\n".join(code_lines)
env = get_environment(self._additional_dependencies)
env["dfs"] = copy.deepcopy(self._get_originals(self._dfs))
# Include skills in the environment
for skill_name, skill_func in self.skills_manager.skills.items():
env[skill_name] = skill_func
exec(code, env)
df_generated = (
env[target_names[0]][target.slice.value]
if is_slice
else env[target_names[0]]
)
# check if exists in provided dfs
for index, df in enumerate(self._dfs):
head = df.get_head()
if head.shape == df_generated.shape and head.columns.equals(
df_generated.columns
):
target_var = (
ast.Subscript(
value=ast.Name(id=target_names[0], ctx=ast.Load()),
slice=target.slice,
ctx=ast.Store(),
)
if is_slice
else ast.Name(id=target_names[0], ctx=ast.Store())
)
return ast.Assign(
targets=[target_var],
value=ast.Subscript(
value=ast.Name(id="dfs", ctx=ast.Load()),
slice=ast.Index(value=ast.Num(n=index)),
ctx=ast.Load(),
),
)
return None
This modification ensures that the skills managed by SkillsManager
are included in the execution environment, thus avoiding the NameError
[1].