LLM-Powered Adaptive Data Analysis Agent

Overview

This project implements an advanced, LLM-powered agent capable of performing complex data analysis tasks across multiple data sources. Using natural language interactions, the agent can query databases, generate visualizations, and create data-driven narratives, adapting its approach based on user intent and available data.

Key Features

LLM-driven reasoning for task planning and execution
Dynamic generation of SQL and Elasticsearch queries
Autonomous data visualization selection and creation
AI-powered data storytelling and insight generation
Multi-step, self-correcting workflow with explicit reasoning
Seamless switching between data sources based on query context
Interactive refinement of queries and outputs

Technical Stack

LangChain & LangGraph: Agent orchestration and reasoning
OpenAI GPT-4: Core language model for decision-making and content generation
FastAPI: Backend API for agent interactions
MySQL & Elasticsearch: Supported data sources
Langfuse: Agent tracing and performance monitoring
React & D3.js/Plotly: Frontend for user interaction and data visualization

Agent Architecture

Intent Classifier: Determines the high-level goal of the user's request
Query Analyzer: Distinguishes between data retrieval, visualization, and storytelling tasks
Task Planner: Breaks down the goal into a series of actionable steps
Context Manager: Maintains and updates the agent's understanding of the current state
Schema Retriever: Fetches database schemas and Elasticsearch mappings
Query Generator: Creates SQL and Elasticsearch queries based on user intent
Query Validator: Ensures generated queries are valid and safe to execute
Tool Selector: Chooses appropriate tools (e.g., SQL query, visualization) for each step
Execution Engine: Runs selected tools and processes their outputs
Reasoning Engine: Evaluates results, makes decisions, and plans next steps
Output Generator: Formulates human-readable responses and visualizations
Visualization Generator: Creates appropriate data visualizations
Storyline Creator: Generates narrative insights from data analysis

Example Interaction

User: "Analyze our Q2 sales performance and visualize the top-performing products."

Agent:

Classifies intent as a multi-step analysis task
Analyzes query to determine data retrieval and visualization needs
Plans steps: retrieve Q2 sales data, identify top products, generate visualization
Retrieves relevant database schema
Generates and validates SQL query to fetch Q2 sales data
Executes query and processes results to identify top-performing products
Selects and generates appropriate visualization (e.g., bar chart)
Creates a data-driven narrative summarizing key insights
Presents visualization, summary, and storyline to user

Setup and Usage

[Setup instructions here]

Extending the Agent

To add new capabilities:

Implement a new Tool class (e.g., NewDataSourceTool, AdvancedVisualizationTool)
Update the Tool Selector to consider the new tool
Enhance the Task Planner and Query Analyzer to incorporate the new capability
Add relevant prompts and few-shot examples for the LLM
Extend the Schema Retriever and Query Generator if adding a new data source
Update the Visualization Generator for new chart types or data representations
Enhance the Storyline Creator to incorporate new types of insights

Contributing

We welcome contributions! See our Contributing Guide for details.

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

fx2y/DataNarrate