This repository provides a template for building Large Language Model (LLM) powered microservices using FastAPI. It's designed to help you quickly set up and deploy AI-driven APIs that leverage the power of LLMs like GPT-3, GPT-4, Claude or other similar models.
- Overview
- Features
- Prerequisites
- Quick Start
- Project Structure
- Configuration
- Usage Examples
- API Endpoints
- Components
- Testing
- Deployment
- Documentation
- Contributing
- License
- Acknowledgments
- π FastAPI framework for high-performance API development
- ποΈ Clean Architecture principles for maintainable and scalable code
- π LLM integration with support for multiple providers (e.g., OpenAI, Anthropic)
- π Prompt management system for versioning and reusing prompts
- β‘ Asynchronous processing of LLM requests
- πΎ Caching layer for improved performance and reduced API costs
- π οΈ Comprehensive error handling and logging
- πΈοΈ Distributed tracing with OpenTelemetry and Jaeger
- π Dependency Injection for improved testability and maintainability
- π³ Dockerized setup for easy deployment
- βΈοΈ Kubernetes configuration for scalable cloud deployments
- Python 3.8+
- Docker and Docker Compose
- Kubernetes (for production deployment)
- Use this template to create a new GitHub repository.
- Clone your new repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name
- Run the directory structure generation script:
python scripts/create_structure.py
- Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Edit
cp .env.example .env
.env
with your LLM API keys and other configuration. - Run the development server:
uvicorn src.main:app --reload
Visit http://localhost:8000/docs
to see the API documentation.
my_fastapi_microservice/
βββ src/
β βββ application/
β β βββ chains/
β β βββ models/
β β βββ prompt_management/
β β βββ services/
β βββ core/
β βββ domain/
β βββ infrastructure/
β β βββ cache/
β β βββ llm_providers/
β βββ presentation/
β βββ api/
β βββ routes/
β βββ schemas/
βββ tests/
βββ docs/
βββ k8s/
βββ scripts/
βββ create_structure.py
LLM-specific configurations are managed in src/core/config.py
. You can specify:
- LLM provider settings
- Model selection
- Token limits
- Caching parameters
Run the test suite with:
pytest
Detailed documentation for various aspects of this project can be found in the /docs
directory:
-
- Comprehensive guide to the APIs exposed by the microservice
- Includes endpoints, request/response formats, authentication, and error handling
-
- Overview of the high-level architecture following Clean Architecture principles
- Describes layers, key components, data flow, and scalability considerations
-
- Instructions for deploying the microservice using Docker and Kubernetes
- Includes scaling, monitoring, and troubleshooting information
-
- Details on integrating and working with Large Language Models
- Covers LLM provider integration, prompt management, and best practices
-
- Outlines security measures and best practices implemented in the microservice
- Includes authentication, authorization, data protection, and LLM-specific security considerations
Please refer to these documents for in-depth information on specific topics related to the project.
Here are some examples of how to use different components of the microservice:
The LLM Orchestrator manages interactions with different LLM providers. Here's how to use it:
from application.services.llm_orchestrator import LLMOrchestrator
from core.dependencies import get_model_factory, get_prompt_repository, get_redis_cache
model_factory = get_model_factory()
prompt_repo = get_prompt_repository()
cache = get_redis_cache()
orchestrator = LLMOrchestrator(model_factory, prompt_repo, cache)
response = await orchestrator.process_request("generate",
prompt="Translate the following English text to French: 'Hello, world!'",
max_tokens=50
)
print(response.choices[0].text)
The Prompt Management system allows you to create, store, and retrieve prompt templates:
from application.prompt_management.prompt_template import PromptTemplate
from application.prompt_management.prompt_repository import PromptRepository
repo = PromptRepository()
# Create a new prompt template
translation_prompt = PromptTemplate(
name="translation",
template="Translate the following {source_language} text to {target_language}: {text}",
version="1.0"
)
# Add the prompt to the repository
repo.add_prompt(translation_prompt)
# Retrieve and use the prompt
prompt = repo.get_prompt("translation")
formatted_prompt = prompt.format(
source_language="English",
target_language="French",
text="Hello, world!"
)
print(formatted_prompt)
The Redis-based caching system can be used to store and retrieve LLM responses:
from infrastructure.cache.redis_cache import RedisCache
cache = RedisCache(host='localhost', port=6379, db=0)
# Caching a response
await cache.set('llm_response:hello_world', 'Bonjour, le monde!', expire=3600)
# Retrieving a cached response
cached_response = await cache.get('llm_response:hello_world')
print(cached_response)
The microservice exposes the following main API endpoints:
POST /api/llm/generate
: Generate text using an LLMPOST /api/llm/summarize
: Summarize text using an LLMGET /api/llm/models
: List available LLM models
For detailed API documentation, run the server and visit http://localhost:8000/docs
.
The microservice supports multiple LLM providers. To add a new provider:
- Create a new file in
src/infrastructure/llm_providers/
- Implement the provider class, inheriting from
BaseLLMProvider
- Register the new provider in
src/core/dependencies.py
Chains represent sequences of operations involving prompts and language models. To create a new chain:
- Create a new file in
src/application/chains/specific_chains/
- Implement the chain class, inheriting from
BaseChain
- Register the new chain in
src/application/chains/__init__.py
Logging, tracing, and other cross-cutting concerns are managed in src/core/cross_cutting.py
. This includes:
- Logging configuration
- Distributed tracing setup with OpenTelemetry and Jaeger
- Middleware for request/response logging
Build and run the Docker container:
docker build -t llm-microservice .
docker run -p 8000:8000 llm-microservice
Apply the Kubernetes manifests:
kubectl apply -f k8s/
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI framework
- OpenAI and other LLM providers
- The open-source community
For more detailed information, please refer to the documentation in the docs/
directory.