Equity Report Generation using Anthropic FastAPI Service

This is a FastAPI service for Equity Report Generation, which provides a RESTful API for the Anthropic Claude AI model using anthropic-sdk-python. For more information on the model, please refer to the Anthropic Claude AI model

Setup Instructions

Running the Project Locally

Clone the repository

git clone https://github.com/Ray7788/AIDF-demo.git

Create a .env file in the root directory and add the following environment variables

ANTHROPIC_API_KEY=YOUR_API_KEY

Create a system message file on src/data/system-messages.txt (Optional)

echo "You are a financial analyst who specializes in evaluating financial data and providing insights for informed decision-making. It is crucial to maintain utmost accuracy. Do not exaggerate, fabricate, outfit types, or make up information." > src/data/system-messages.txt

Create a virtual environment and install the dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run the following commands to start the server

chmod +x runner.sh
./runner.sh

Running the Project with Docker

Clone the repository

git clone https://github.com/Ray7788/AIDF-demo.git

Create a .env file in the root directory and add the following environment variables

ANTHROPIC_API_KEY=YOUR_API_KEY

Create a system message file on src/data/system-messages.txt (Optional)

echo "You are a financial analyst who specializes in evaluating financial data and providing insights for informed decision-making. It is crucial to maintain utmost accuracy. Do not exaggerate, fabricate, outfit types, or make up information." > src/data/system-messages.txt

Build and run the docker container

docker-compose up --build

API Documentation

The API documentation can be found at http://localhost:8000/docs You can also define your own portal number :)

API Technology Stack Design

1. FastAPI as the Backend Framework

Key Reasons for Choosing FastAPI:

Performance: Built on Starlette and Pydantic, FastAPI is one of the fastest Python frameworks available, crucial for handling concurrent report generation requests
Asynchronous Support: Native async/await support enables efficient handling of I/O-bound operations like LLM API calls and database queries
Automatic Documentation: Generates interactive OpenAPI and Swagger UI documentation automatically, fulfilling the project's documentation requirements
Type Safety: Leverages Python type hints and Pydantic models for robust input validation and serialization
Modern Standards: Supports OpenAPI, JSON Schema, OAuth2, and other modern web standards

Relevant Libraries:

fastapi (core framework)
uvicorn (ASGI server)
httpx (async HTTP client for external API calls)
pydantic (data validation and settings management)

2. MongoDB as the Database

Advantages for This Project:

Schema Flexibility: Accommodates varying report structures and evolving financial data formats without rigid schema definitions
Document-Oriented: Naturally stores JSON-like documents matching our financial data structure
Scalability: Handles large volumes of generated reports and user requests efficiently

Implementation Details:

pymongo (official MongoDB driver)
Document design optimized for:
- User report history tracking
- Company metadata storage
- Report generation job status monitoring

3. Yahoo Finance Integration

Why yfinance Was Selected:

Comprehensive Data: Provides stock prices, historical data, fundamentals, and other market indicators
Free Access: No API key required for basic usage (unlike premium financial data providers)
Pythonic Interface: Simple integration with our pandas-based data processing pipeline
Reliability: Well-maintained library with good community support

Data Processing:

yfinance (Yahoo Finance API wrapper)
pandas (data manipulation and analysis)
Financial data is:
- Fetched asynchronously
- Cleaned and normalized
- Combined with provided JSON datasets
- Formatted for LLM prompt injection

4. Anthropic LLM Integration

Implementation Approach:

anthropic (official client library)
Async API calls to handle long-running report generation (1-2 minutes)
Cost management through:
- Model selection (prioritizing cost-effective options)
- Prompt optimization
- Response streaming
Agentic framework considerations:
- Chained API calls for complex analysis
- Intermediate result caching
- Error handling and retries

5. Supporting Libraries

Key Supporting Components:

Authentication: PyJWT for optional JWT implementation
Async Operations: aiohttp for concurrent external API calls
Data Processing: pandas for financial data manipulation from Yahoo Finance
Environment Management: python-dotenv for API key management

Here is a preview for the authentication of user's token using PyJWT

6. System Architecture Benefits

Overall Advantages of This Stack:

Performance: Async architecture maximizes throughput for report generation tasks
Scalability: MongoDB scales horizontally as report volume grows
Maintainability: Type hints and Pydantic models create self-documenting code
Extensibility: Modular design allows easy addition of:
- New data sources
- Additional LLM providers
- Advanced analytics features
Developer Experience: Excellent tooling and debugging capabilities

7. Future Considerations

Potential Enhancements:

Add Redis for caching frequent queries and intermediate results
Implement Celery or similar for background task processing
Implement rate limiting for API protection

Equity Report Generation System Design

1. Project Overview & Business Requirements

1.1 Core Objective

The system is designed to automate the generation of equity research reports using LLM (Large Language Model) technology, allowing users to generate detailed financial analysis reports with minimal effort.

1.2 Key Functional Requirements

Task Submission

Users can request a report for a specific company (e.g., AAPL, MSFT).
The system should validate the company against the provided company_metadata.json.

Report Retrieval

Users can fetch previously generated reports.
Reports should be stored persistently for future access.

Data Integration

Combine static data (provided JSON files) with dynamic market data (Yahoo Finance).
Feed structured data into the LLM to generate coherent reports.

Asynchronous Processing

Report generation takes about 1 minute due to multiple API calls (LLM + financial data).
Users should receive a task_id immediately and poll for completion.

2. System Architecture Design

2.1 High-Level Architecture

The system follows a modular, event-driven approach:

┌─────────────┐       ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
│   Client    │──────▶│   FastAPI   │──────▶│   MongoDB   │──────▶│   LLM API   │
│ (FastAPI'UI)│       │   (Backend) │       │  (Database) │       │ (Anthropic) │
└─────────────┘       └─────────────┘       └─────────────┘       └─────────────┘

2.2 Component Breakdown

A simple request based on a company and a specific year

A simple response from Anhropic API

(1) API Layer (FastAPI)

Endpoints:
- See /docs for auto-generated API documentation.
Key Features:
- Asynchronous Processing: Long-running tasks are offloaded to a background worker.
- Request Validation: Ensures the requested company exists in company_metadata.json.
- Rate Limiting (Optional): Prevents abuse of the LLM API.

(2) Data Processing Layer

Static Data (JSON Files):
- company_metadata.json → Basic company info (e.g., name, sector, industry).
- company_financial_ratios.json → Financial metrics (e.g., P/E ratio, ROE).
- level1_ECC.json → Earnings call transcripts (qualitative insights).
- level2_TenK.json → SEC 10-K filings (detailed financials).
Dynamic Data (Yahoo Finance):
- Fetches real-time:
  - Stock price history
  - Market cap, P/E ratio
  - Dividend yield, volatility, etc.
Data Merging:
- Combines static + dynamic data into a structured format for LLM prompts.

(3) LLM Integration Layer

Anthropic API Usage:

Prompt Engineering:

Uses a system message to set the context for the LLM. Check src/data/system-messages.txt for the system message.

You are a professional equity research analyst tasked with writing a comprehensive stock research report for an investment firm. 
Generate a well-structured, data-driven equity report based on the provided company data and market information. The report must be objective, professional, and formatted clearly. 
Required Structure: 1. Company Overview - Business model & key operations - Market position & competitive advantages 2. Financial Analysis - Key financial ratios (e.g., P/E, ROE, debt-to-equity) - Revenue & earnings trends - Balance sheet & cash flow highlights 3. Industry Analysis - Industry growth drivers & challenges - Competitive landscape - Regulatory considerations 4. Valuation Analysis - Relative valuation (P/E, EV/EBITDA vs. peers) - Discounted cash flow (DCF) assumptions (if applicable) - Historical valuation range 5. Risk Factors - Company-specific risks - Industry & macroeconomic risks 6. Investment Recommendation - Clear rating (Buy/Hold/Sell) - Target price range (if applicable) - Investment horizon 
| Writing Guidelines: - Use formal, concise language - Support all claims with data - Highlight anomalies or inconsistencies - Use bold for key metrics - Avoid unsupported speculation Output must begin with: [Company Name] Equity Research Report 1. Company Overview [Content] 2. Financial Analysis [Content] 3. Industry Analysis [Content] 4. Valuation Analysis [Content] 5. Risk Factors [Content] 6. Investment Recommendation | Begin the report immediately without introductory phrases.

Cost Optimization:
- Uses smaller models for initial drafts.

(4) Storage Layer (MongoDB)

Collections:
- reports → Stores generated reports ({report_id, company_id, content, timestamp}).
Advantages:
- Schema-less → Flexible for evolving report formats.
- High write throughput → Suitable for frequent report generation.

2.3 Data Flow

User submits a task (POST /generate XXX).
API validates the company and initiates a background task.
Data Fetcher pulls:
- Static data (JSON files).
- Dynamic data (Yahoo Finance).
LLM Processor constructs a prompt and calls Anthropic API.
Report is saved to MongoDB and returned to the user.