/llm-powered-microservice-template

This repository provides a template for building Large Language Model (LLM) powered microservices using FastAPI. It's designed to help you quickly set up and deploy AI-driven APIs that leverage the power of LLMs like GPT-3, GPT-4, Claude or other similar models.

Primary LanguagePythonMIT LicenseMIT

LLM-Powered Microservice Template

Overview

This repository provides a template for building Large Language Model (LLM) powered microservices using FastAPI. It's designed to help you quickly set up and deploy AI-driven APIs that leverage the power of LLMs like GPT-3, GPT-4, Claude or other similar models.

Table of Contents

  1. Overview
  2. Features
  3. Prerequisites
  4. Quick Start
  5. Project Structure
  6. Configuration
  7. Usage Examples
  8. API Endpoints
  9. Components
  10. Testing
  11. Deployment
  12. Documentation
  13. Contributing
  14. License
  15. Acknowledgments

Features

  • πŸš€ FastAPI framework for high-performance API development
  • πŸ—οΈ Clean Architecture principles for maintainable and scalable code
  • πŸ”Œ LLM integration with support for multiple providers (e.g., OpenAI, Anthropic)
  • πŸ“ Prompt management system for versioning and reusing prompts
  • ⚑ Asynchronous processing of LLM requests
  • πŸ’Ύ Caching layer for improved performance and reduced API costs
  • πŸ› οΈ Comprehensive error handling and logging
  • πŸ•ΈοΈ Distributed tracing with OpenTelemetry and Jaeger
  • πŸ’‰ Dependency Injection for improved testability and maintainability
  • 🐳 Dockerized setup for easy deployment
  • ☸️ Kubernetes configuration for scalable cloud deployments

Prerequisites

  • Python 3.8+
  • Docker and Docker Compose
  • Kubernetes (for production deployment)

Quick Start

  1. Use this template to create a new GitHub repository.
  2. Clone your new repository:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
  3. Run the directory structure generation script:
    python scripts/create_structure.py
  4. Set up a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  5. Install dependencies:
    pip install -r requirements.txt
  6. Set up environment variables:
    cp .env.example .env
    Edit .env with your LLM API keys and other configuration.
  7. Run the development server:
    uvicorn src.main:app --reload

Visit http://localhost:8000/docs to see the API documentation.

Project Structure

my_fastapi_microservice/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ application/
β”‚   β”‚   β”œβ”€β”€ chains/
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ prompt_management/
β”‚   β”‚   └── services/
β”‚   β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ domain/
β”‚   β”œβ”€β”€ infrastructure/
β”‚   β”‚   β”œβ”€β”€ cache/
β”‚   β”‚   └── llm_providers/
β”‚   └── presentation/
β”‚       └── api/
β”‚           β”œβ”€β”€ routes/
β”‚           └── schemas/
β”œβ”€β”€ tests/
β”œβ”€β”€ docs/
β”œβ”€β”€ k8s/
└── scripts/
    └── create_structure.py

Configuration

LLM-specific configurations are managed in src/core/config.py. You can specify:

  • LLM provider settings
  • Model selection
  • Token limits
  • Caching parameters

Testing

Run the test suite with:

pytest

Documentation

Detailed documentation for various aspects of this project can be found in the /docs directory:

  1. API Documentation

    • Comprehensive guide to the APIs exposed by the microservice
    • Includes endpoints, request/response formats, authentication, and error handling
  2. Architecture Documentation

    • Overview of the high-level architecture following Clean Architecture principles
    • Describes layers, key components, data flow, and scalability considerations
  3. Deployment Guide

    • Instructions for deploying the microservice using Docker and Kubernetes
    • Includes scaling, monitoring, and troubleshooting information
  4. LLM Integration Guide

    • Details on integrating and working with Large Language Models
    • Covers LLM provider integration, prompt management, and best practices
  5. Security Documentation

    • Outlines security measures and best practices implemented in the microservice
    • Includes authentication, authorization, data protection, and LLM-specific security considerations

Please refer to these documents for in-depth information on specific topics related to the project.

Usage

Here are some examples of how to use different components of the microservice:

LLM Orchestrator

The LLM Orchestrator manages interactions with different LLM providers. Here's how to use it:

from application.services.llm_orchestrator import LLMOrchestrator
from core.dependencies import get_model_factory, get_prompt_repository, get_redis_cache

model_factory = get_model_factory()
prompt_repo = get_prompt_repository()
cache = get_redis_cache()

orchestrator = LLMOrchestrator(model_factory, prompt_repo, cache)

response = await orchestrator.process_request("generate", 
    prompt="Translate the following English text to French: 'Hello, world!'",
    max_tokens=50
)
print(response.choices[0].text)

Prompt Management

The Prompt Management system allows you to create, store, and retrieve prompt templates:

from application.prompt_management.prompt_template import PromptTemplate
from application.prompt_management.prompt_repository import PromptRepository

repo = PromptRepository()

# Create a new prompt template
translation_prompt = PromptTemplate(
    name="translation",
    template="Translate the following {source_language} text to {target_language}: {text}",
    version="1.0"
)

# Add the prompt to the repository
repo.add_prompt(translation_prompt)

# Retrieve and use the prompt
prompt = repo.get_prompt("translation")
formatted_prompt = prompt.format(
    source_language="English",
    target_language="French",
    text="Hello, world!"
)
print(formatted_prompt)

Caching

The Redis-based caching system can be used to store and retrieve LLM responses:

from infrastructure.cache.redis_cache import RedisCache

cache = RedisCache(host='localhost', port=6379, db=0)

# Caching a response
await cache.set('llm_response:hello_world', 'Bonjour, le monde!', expire=3600)

# Retrieving a cached response
cached_response = await cache.get('llm_response:hello_world')
print(cached_response)

API Endpoints

The microservice exposes the following main API endpoints:

  • POST /api/llm/generate: Generate text using an LLM
  • POST /api/llm/summarize: Summarize text using an LLM
  • GET /api/llm/models: List available LLM models

For detailed API documentation, run the server and visit http://localhost:8000/docs.

Components

LLM Providers

The microservice supports multiple LLM providers. To add a new provider:

  1. Create a new file in src/infrastructure/llm_providers/
  2. Implement the provider class, inheriting from BaseLLMProvider
  3. Register the new provider in src/core/dependencies.py

Chains

Chains represent sequences of operations involving prompts and language models. To create a new chain:

  1. Create a new file in src/application/chains/specific_chains/
  2. Implement the chain class, inheriting from BaseChain
  3. Register the new chain in src/application/chains/__init__.py

Cross-Cutting Concerns

Logging, tracing, and other cross-cutting concerns are managed in src/core/cross_cutting.py. This includes:

  • Logging configuration
  • Distributed tracing setup with OpenTelemetry and Jaeger
  • Middleware for request/response logging

Deployment

Docker

Build and run the Docker container:

docker build -t llm-microservice .
docker run -p 8000:8000 llm-microservice

Kubernetes

Apply the Kubernetes manifests:

kubectl apply -f k8s/

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • FastAPI framework
  • OpenAI and other LLM providers
  • The open-source community

For more detailed information, please refer to the documentation in the docs/ directory.