/Equity-Report-Generation-Anthropic-FastAPI

A demo for equity report generation using Anthropic API

Primary LanguagePython

Equity Report Generation using Anthropic FastAPI Service

This is a FastAPI service for Equity Report Generation, which provides a RESTful API for the Anthropic Claude AI model using anthropic-sdk-python. For more information on the model, please refer to the Anthropic Claude AI model

Setup Instructions

Running the Project Locally

  1. Clone the repository
git clone https://github.com/Ray7788/AIDF-demo.git
  1. Create a .env file in the root directory and add the following environment variables
ANTHROPIC_API_KEY=YOUR_API_KEY
  1. Create a system message file on src/data/system-messages.txt (Optional)
echo "You are a financial analyst who specializes in evaluating financial data and providing insights for informed decision-making. It is crucial to maintain utmost accuracy. Do not exaggerate, fabricate, outfit types, or make up information." > src/data/system-messages.txt
  1. Create a virtual environment and install the dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  1. Run the following commands to start the server
chmod +x runner.sh
./runner.sh

Running the Project with Docker

  1. Clone the repository
git clone https://github.com/Ray7788/AIDF-demo.git
  1. Create a .env file in the root directory and add the following environment variables
ANTHROPIC_API_KEY=YOUR_API_KEY
  1. Create a system message file on src/data/system-messages.txt (Optional)
echo "You are a financial analyst who specializes in evaluating financial data and providing insights for informed decision-making. It is crucial to maintain utmost accuracy. Do not exaggerate, fabricate, outfit types, or make up information." > src/data/system-messages.txt
  1. Build and run the docker container
docker-compose up --build

API Documentation

The API documentation can be found at http://localhost:8000/docs You can also define your own portal number :) Here is a preview for this

API Technology Stack Design

1. FastAPI as the Backend Framework

Key Reasons for Choosing FastAPI:

  • Performance: Built on Starlette and Pydantic, FastAPI is one of the fastest Python frameworks available, crucial for handling concurrent report generation requests
  • Asynchronous Support: Native async/await support enables efficient handling of I/O-bound operations like LLM API calls and database queries
  • Automatic Documentation: Generates interactive OpenAPI and Swagger UI documentation automatically, fulfilling the project's documentation requirements
  • Type Safety: Leverages Python type hints and Pydantic models for robust input validation and serialization
  • Modern Standards: Supports OpenAPI, JSON Schema, OAuth2, and other modern web standards

Relevant Libraries:

  • fastapi (core framework)
  • uvicorn (ASGI server)
  • httpx (async HTTP client for external API calls)
  • pydantic (data validation and settings management)

2. MongoDB as the Database

Advantages for This Project:

  • Schema Flexibility: Accommodates varying report structures and evolving financial data formats without rigid schema definitions
  • Document-Oriented: Naturally stores JSON-like documents matching our financial data structure
  • Scalability: Handles large volumes of generated reports and user requests efficiently

Implementation Details:

  • pymongo (official MongoDB driver)
  • Document design optimized for:
    • User report history tracking
    • Company metadata storage
    • Report generation job status monitoring

3. Yahoo Finance Integration

Why yfinance Was Selected:

  • Comprehensive Data: Provides stock prices, historical data, fundamentals, and other market indicators
  • Free Access: No API key required for basic usage (unlike premium financial data providers)
  • Pythonic Interface: Simple integration with our pandas-based data processing pipeline
  • Reliability: Well-maintained library with good community support

Data Processing:

  • yfinance (Yahoo Finance API wrapper)
  • pandas (data manipulation and analysis)
  • Financial data is:
    • Fetched asynchronously
    • Cleaned and normalized
    • Combined with provided JSON datasets
    • Formatted for LLM prompt injection

4. Anthropic LLM Integration

Implementation Approach:

  • anthropic (official client library)
  • Async API calls to handle long-running report generation (1-2 minutes)
  • Cost management through:
    • Model selection (prioritizing cost-effective options)
    • Prompt optimization
    • Response streaming
  • Agentic framework considerations:
    • Chained API calls for complex analysis
    • Intermediate result caching
    • Error handling and retries

5. Supporting Libraries

Key Supporting Components:

  • Authentication: PyJWT for optional JWT implementation
  • Async Operations: aiohttp for concurrent external API calls
  • Data Processing: pandas for financial data manipulation from Yahoo Finance
  • Environment Management: python-dotenv for API key management

Here is a preview for the authentication of user's token using PyJWT

6. System Architecture Benefits

Overall Advantages of This Stack:

  1. Performance: Async architecture maximizes throughput for report generation tasks
  2. Scalability: MongoDB scales horizontally as report volume grows
  3. Maintainability: Type hints and Pydantic models create self-documenting code
  4. Extensibility: Modular design allows easy addition of:
    • New data sources
    • Additional LLM providers
    • Advanced analytics features
  5. Developer Experience: Excellent tooling and debugging capabilities

7. Future Considerations

Potential Enhancements:

  • Add Redis for caching frequent queries and intermediate results
  • Implement Celery or similar for background task processing
  • Implement rate limiting for API protection

Equity Report Generation System Design

1. Project Overview & Business Requirements

1.1 Core Objective

The system is designed to automate the generation of equity research reports using LLM (Large Language Model) technology, allowing users to generate detailed financial analysis reports with minimal effort.

1.2 Key Functional Requirements

Task Submission

  • Users can request a report for a specific company (e.g., AAPL, MSFT).
  • The system should validate the company against the provided company_metadata.json.

Report Retrieval

  • Users can fetch previously generated reports.
  • Reports should be stored persistently for future access.

Data Integration

  • Combine static data (provided JSON files) with dynamic market data (Yahoo Finance).
  • Feed structured data into the LLM to generate coherent reports.

Asynchronous Processing

  • Report generation takes about 1 minute due to multiple API calls (LLM + financial data).
  • Users should receive a task_id immediately and poll for completion.

2. System Architecture Design

2.1 High-Level Architecture

The system follows a modular, event-driven approach:

┌─────────────┐       ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
│   Client    │──────▶│   FastAPI   │──────▶│   MongoDB   │──────▶│   LLM API   │
│ (FastAPI'UI)│       │   (Backend) │       │  (Database) │       │ (Anthropic) │
└─────────────┘       └─────────────┘       └─────────────┘       └─────────────┘

2.2 Component Breakdown

A simple request based on a company and a specific year

Here is a preview for this

A simple response from Anhropic API

Here is a preview for this

(1) API Layer (FastAPI)

  • Endpoints:

    • See /docs for auto-generated API documentation.
  • Key Features:

    • Asynchronous Processing: Long-running tasks are offloaded to a background worker.
    • Request Validation: Ensures the requested company exists in company_metadata.json.
    • Rate Limiting (Optional): Prevents abuse of the LLM API.

(2) Data Processing Layer

  • Static Data (JSON Files):

    • company_metadata.json → Basic company info (e.g., name, sector, industry).
    • company_financial_ratios.json → Financial metrics (e.g., P/E ratio, ROE).
    • level1_ECC.json → Earnings call transcripts (qualitative insights).
    • level2_TenK.json → SEC 10-K filings (detailed financials).
  • Dynamic Data (Yahoo Finance):

    • Fetches real-time:
      • Stock price history
      • Market cap, P/E ratio
      • Dividend yield, volatility, etc.
  • Data Merging:

    • Combines static + dynamic data into a structured format for LLM prompts.

(3) LLM Integration Layer

  • Anthropic API Usage:
    • Prompt Engineering:
    • Uses a system message to set the context for the LLM. Check src/data/system-messages.txt for the system message.
      You are a professional equity research analyst tasked with writing a comprehensive stock research report for an investment firm. 
      Generate a well-structured, data-driven equity report based on the provided company data and market information. The report must be objective, professional, and formatted clearly. 
      Required Structure: 1. Company Overview - Business model & key operations - Market position & competitive advantages 2. Financial Analysis - Key financial ratios (e.g., P/E, ROE, debt-to-equity) - Revenue & earnings trends - Balance sheet & cash flow highlights 3. Industry Analysis - Industry growth drivers & challenges - Competitive landscape - Regulatory considerations 4. Valuation Analysis - Relative valuation (P/E, EV/EBITDA vs. peers) - Discounted cash flow (DCF) assumptions (if applicable) - Historical valuation range 5. Risk Factors - Company-specific risks - Industry & macroeconomic risks 6. Investment Recommendation - Clear rating (Buy/Hold/Sell) - Target price range (if applicable) - Investment horizon 
      | Writing Guidelines: - Use formal, concise language - Support all claims with data - Highlight anomalies or inconsistencies - Use bold for key metrics - Avoid unsupported speculation Output must begin with: [Company Name] Equity Research Report 1. Company Overview [Content] 2. Financial Analysis [Content] 3. Industry Analysis [Content] 4. Valuation Analysis [Content] 5. Risk Factors [Content] 6. Investment Recommendation | Begin the report immediately without introductory phrases. 
      
    • Cost Optimization:
      • Uses smaller models for initial drafts.

(4) Storage Layer (MongoDB)

  • Collections:

    • reports → Stores generated reports ({report_id, company_id, content, timestamp}).
  • Advantages:

    • Schema-less → Flexible for evolving report formats.
    • High write throughput → Suitable for frequent report generation.

2.3 Data Flow

  1. User submits a task (POST /generate XXX).
  2. API validates the company and initiates a background task.
  3. Data Fetcher pulls:
    • Static data (JSON files).
    • Dynamic data (Yahoo Finance).
  4. LLM Processor constructs a prompt and calls Anthropic API.
  5. Report is saved to MongoDB and returned to the user.