This project implements an advanced, LLM-powered agent capable of performing complex data analysis tasks across multiple data sources. Using natural language interactions, the agent can query databases, generate visualizations, and create data-driven narratives, adapting its approach based on user intent and available data.
- LLM-driven reasoning for task planning and execution
- Dynamic generation of SQL and Elasticsearch queries
- Autonomous data visualization selection and creation
- AI-powered data storytelling and insight generation
- Multi-step, self-correcting workflow with explicit reasoning
- Seamless switching between data sources based on query context
- Interactive refinement of queries and outputs
- LangChain & LangGraph: Agent orchestration and reasoning
- OpenAI GPT-4: Core language model for decision-making and content generation
- FastAPI: Backend API for agent interactions
- MySQL & Elasticsearch: Supported data sources
- Langfuse: Agent tracing and performance monitoring
- React & D3.js/Plotly: Frontend for user interaction and data visualization
- Intent Classifier: Determines the high-level goal of the user's request
- Query Analyzer: Distinguishes between data retrieval, visualization, and storytelling tasks
- Task Planner: Breaks down the goal into a series of actionable steps
- Context Manager: Maintains and updates the agent's understanding of the current state
- Schema Retriever: Fetches database schemas and Elasticsearch mappings
- Query Generator: Creates SQL and Elasticsearch queries based on user intent
- Query Validator: Ensures generated queries are valid and safe to execute
- Tool Selector: Chooses appropriate tools (e.g., SQL query, visualization) for each step
- Execution Engine: Runs selected tools and processes their outputs
- Reasoning Engine: Evaluates results, makes decisions, and plans next steps
- Output Generator: Formulates human-readable responses and visualizations
- Visualization Generator: Creates appropriate data visualizations
- Storyline Creator: Generates narrative insights from data analysis
User: "Analyze our Q2 sales performance and visualize the top-performing products."
Agent:
- Classifies intent as a multi-step analysis task
- Analyzes query to determine data retrieval and visualization needs
- Plans steps: retrieve Q2 sales data, identify top products, generate visualization
- Retrieves relevant database schema
- Generates and validates SQL query to fetch Q2 sales data
- Executes query and processes results to identify top-performing products
- Selects and generates appropriate visualization (e.g., bar chart)
- Creates a data-driven narrative summarizing key insights
- Presents visualization, summary, and storyline to user
[Setup instructions here]
To add new capabilities:
- Implement a new Tool class (e.g., NewDataSourceTool, AdvancedVisualizationTool)
- Update the Tool Selector to consider the new tool
- Enhance the Task Planner and Query Analyzer to incorporate the new capability
- Add relevant prompts and few-shot examples for the LLM
- Extend the Schema Retriever and Query Generator if adding a new data source
- Update the Visualization Generator for new chart types or data representations
- Enhance the Storyline Creator to incorporate new types of insights
We welcome contributions! See our Contributing Guide for details.
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.