A comprehensive Python framework for interacting with AI models through the OpenRouter API with intelligent cost management and free model prioritization.
- π Free Model Priority: Defaults to free models to minimize costs
- π° Cost Awareness: Clear warnings and protection against unexpected charges
- π Smart Fallbacks: Automatic fallback from paid to free models on failures
- π‘οΈ Error Protection: Prevents accidental paid model usage in development
- π Multiple Interaction Patterns: Flexible APIs for different use cases
- π― 14 Free Models: Including Llama 3.3/4, Gemini 2.0/2.5, DeepSeek R1, and more
- β‘ Easy Integration: Simple setup and intuitive API design
- π API Call Logging: Comprehensive request/response logging with timing and token usage
- πΎ Response Caching: In-memory and persistent caching with TTL and LRU eviction
- π‘ Streaming Support: Real-time streaming responses with Server-Sent Events
- β‘ Async/Await Support: Full asynchronous support with aiohttp for concurrent operations
- π Dynamic Model Fetching: Live model data fetching from OpenRouter API (300+ models)
- π Feature Integration: All features work seamlessly together
- Clone or download the framework
- Install dependencies (Updated with new async support):
pip install python-dotenv requests PyYAML aiohttp
- Set up your API key in
.env:OPENROUTER_API_KEY="sk-or-v1-your-key-here"
from framework import AIInteraction, quick_free_response
# One-liner free response
response = quick_free_response("What is Python programming?")
# Default interaction (uses free models with new caching + logging)
interaction = AIInteraction()
response = interaction.generate_response("Explain machine learning")# Streaming responses
from framework import stream_to_console
result = stream_to_console("Tell me a story", show_chunks=True)
# Async support
import asyncio
from async_framework import quick_async_free_response
async def main():
response = await quick_async_free_response("Async question")
# Caching configuration
from caching import CacheConfig
from client import OpenRouterClient
cache_config = CacheConfig(enabled=True, max_size=1000, ttl_seconds=3600)
client = OpenRouterClient(cache_config=cache_config, enable_logging=True)
asyncio.run(main())from framework import FreeModelInteraction
# Only uses free models - perfect for development
interaction = FreeModelInteraction()
response = interaction.generate_response(
"Write a Python function to sort a list",
default_prompt_name="code_generator"
)- Architecture Overview
- Model Categories
- Interaction Patterns
- Framework Components
- Usage Examples
- Cost Management
- Configuration
- API Reference
- Advanced Features
- Troubleshooting
openrouter-framework/
βββ .env # API key configuration
βββ .gitignore # Git ignore file
βββ requirements.txt # Python dependencies (Updated: aiohttp)
βββ config.py # Environment and API configuration
βββ models.py # AI model definitions + dynamic model fetching
βββ prompts.py # Prompt management system
βββ default_prompts.yaml # Pre-defined system prompts
βββ client.py # OpenRouter API client + logging + caching + streaming
βββ async_client.py # NEW: Async OpenRouter client with aiohttp
βββ framework.py # Main interaction classes + streaming utilities
βββ async_framework.py # NEW: Async interaction classes and utilities
βββ caching.py # NEW: Response caching system (in-memory + persistent)
βββ main.py # Comprehensive demo and examples
βββ CLAUDE.md # Development guide (Updated)
βββ README.md # This file (Updated)
- Free Model Priority: Framework defaults to free models to prevent unexpected costs
- Explicit Paid Usage: Paid models require intentional selection with clear warnings
- Cost Awareness: Visual and programmatic alerts for cost-incurring operations
- Smart Fallbacks: Graceful degradation from paid to free models on failures
- Developer Safety: Multiple safeguards against accidental charges during development
- Meta Llama 3.3 8B Instruct (Default free model)
- Meta Llama 4 Maverick (400B params, 17B active)
- Meta Llama 4 Scout (109B params, 17B active)
- Meta Llama 3.3 70B Instruct (High-performance free option)
- Google Gemini 2.5 Pro Experimental (Latest Google model)
- Google Gemini 2.0 Flash Experimental
- Google Gemini 2.0 Flash Thinking Experimental
- DeepSeek Chat V3 (Excellent reasoning)
- DeepSeek R1 (Advanced reasoning model)
- DeepSeek R1 Distill Llama 70B
- Mistral 7B Instruct
- Mistral Small 3.1 24B Instruct
- Qwen QwQ 32B
- NVIDIA Llama 3.1 Nemotron Ultra 253B
- OpenAI GPT-4o (Premium performance)
- OpenAI GPT-4o Mini (Cost-effective premium)
- OpenAI GPT-3.5 Turbo (Default paid model)
- Anthropic Claude 3 Opus (Advanced reasoning)
- Anthropic Claude 3 Sonnet (Balanced performance)
- Anthropic Claude 3 Haiku (Fast responses)
- Google Gemma 7B Instruct
- Meta Llama 3 8B Instruct
- Meta Llama 3 70B Instruct
from framework import AIInteraction
# Automatically uses free models
interaction = AIInteraction()
response = interaction.generate_response("Your question here")
# Check current model
info = interaction.get_model_info()
print(f"Using: {info['display_name']} (Cost: {info['cost_tier']})")from framework import FreeModelInteraction
# Guaranteed free - rejects paid models
interaction = FreeModelInteraction()
response = interaction.generate_response(
"Generate Python code",
default_prompt_name="code_generator"
)from framework import CostAwareInteraction
# Smart paidβfree fallback
cost_aware = CostAwareInteraction(prefer_free=True, auto_fallback=True)
# Force free model
response = cost_aware.generate_response("Question", force_free=True)
# Allow paid model (with warnings)
response = cost_aware.generate_response("Question", force_paid=True)from framework import quick_free_response, compare_free_vs_paid, list_model_costs
# One-liner free response
response = quick_free_response("What is AI?")
# Compare free vs paid quality
comparison = compare_free_vs_paid("Explain quantum computing")
print(f"Free: {comparison['free_response']}")
print(f"Paid: {comparison['paid_response']}")
# List available models
costs = list_model_costs()
print(f"Free models: {len(costs['free'])}")
print(f"Paid models: {len(costs['paid'])}")Handles environment variables and API configuration:
from config import OPENROUTER_API_KEY, OPENROUTER_BASE_URLFeatures:
- Loads API key from
.envfile - Validates configuration on startup
- Provides base URL for OpenRouter API
Defines all available AI models with cost tiers:
from models import (
get_default_model, get_free_models, get_paid_models,
Llama33_8B_Free, GPT35Turbo, CostTier
)
# Get default models
free_model = get_default_model(prefer_free=True)
paid_model = get_default_model(prefer_free=False)
# Check model properties
print(f"Is free: {free_model.is_free}")
print(f"Cost tier: {free_model.cost_tier}")Key Classes:
AIModel- Abstract base class for all modelsCostTier- Enumeration for FREE/PAID classification- Model-specific classes for each supported model
Manages default system prompts from YAML configuration:
from prompts import PromptManager
pm = PromptManager()
prompt = pm.get_prompt("code_generator")
available = pm.list_prompts()Available Prompts:
code_generator- For code generation taskscreative_writer- For creative writingdata_analyst- For data analysis tasksgeneral_assistant- General purpose assistanttechnical_writer- For technical documentation
Handles all communication with the OpenRouter API:
from client import OpenRouterClient
client = OpenRouterClient()
response = client.send_request(
model_name="meta-llama/llama-3.3-8b-instruct:free",
messages=[{"role": "user", "content": "Hello!"}]
)
content = client.get_response_content(response)Features:
- Automatic retry logic
- Error handling and validation
- Response parsing and content extraction
- Support for all OpenRouter API parameters
Core interaction classes and utilities:
from framework import (
AIInteraction, FreeModelInteraction, CostAwareInteraction,
MultiModelInteraction, quick_free_response, compare_free_vs_paid
)Core Classes:
AIInteraction- Main interaction class with cost awarenessFreeModelInteraction- Free-only interaction (development safe)CostAwareInteraction- Smart cost management with fallbacksMultiModelInteraction- Multiple model management
from framework import quick_free_response
# Quick free response
answer = quick_free_response("What is the capital of France?")
print(answer) # "Paris"from framework import FreeModelInteraction
interaction = FreeModelInteraction()
code = interaction.generate_response(
"Create a function to calculate factorial",
default_prompt_name="code_generator"
)
print(code)from framework import AIInteraction
interaction = AIInteraction()
story = interaction.generate_response(
"A robot discovers emotions for the first time",
default_prompt_name="creative_writer"
)
print(story)from framework import FreeModelInteraction
interaction = FreeModelInteraction()
conversation = [
{"role": "system", "content": "You are a helpful programming tutor."},
{"role": "user", "content": "What is recursion?"}
]
response1 = interaction.chat(conversation)
print(f"AI: {response1}")
# Continue conversation
conversation.extend([
{"role": "assistant", "content": response1},
{"role": "user", "content": "Can you give me an example?"}
])
response2 = interaction.chat(conversation)
print(f"AI: {response2}")from framework import compare_free_vs_paid
comparison = compare_free_vs_paid(
"Explain machine learning in simple terms",
default_prompt_name="general_assistant"
)
print(f"Free Model ({comparison['free_model']}):")
print(comparison['free_response'])
print(f"\nPaid Model ({comparison['paid_model']}):")
print(comparison['paid_response'])from framework import CostAwareInteraction
# Initialize with preferences
cost_aware = CostAwareInteraction(
prefer_free=True, # Default to free models
auto_fallback=True # Fallback to free on paid failures
)
# Regular usage (uses free model)
response = cost_aware.generate_response("Explain Python decorators")
# Force premium model when needed
premium_response = cost_aware.generate_response(
"Complex analysis task",
force_paid=True,
default_prompt_name="data_analyst"
)
# Switch models dynamically
cost_aware.switch_to_paid() # Switch to paid model (with warning)
cost_aware.switch_to_free() # Switch back to free modelfrom framework import FreeModelInteraction
from models import get_free_models
# Process multiple questions with different free models
questions = [
"What is Python?",
"Explain machine learning",
"How do neural networks work?"
]
free_models = get_free_models()[:3] # Use first 3 free models
for i, question in enumerate(questions):
interaction = FreeModelInteraction(free_models[i])
response = interaction.generate_response(question)
print(f"Model {i+1}: {response[:100]}...")- Default Free Models: Framework defaults to free models automatically
- Clear Warnings: Visual alerts when using paid models
- Error Prevention:
FreeModelInteractionrejects paid models - Smart Fallbacks: Automatic fallback to free models on failures
from framework import AIInteraction
from models import GPT35Turbo
# This triggers a cost warning
interaction = AIInteraction(GPT35Turbo())
# Output: β οΈ COST WARNING: You are using a PAID model...# Use these patterns during development
from framework import FreeModelInteraction, quick_free_response
# Guaranteed free
interaction = FreeModelInteraction()
# Quick testing
response = quick_free_response("test prompt")# Use these patterns in production
from framework import CostAwareInteraction
# Smart cost management
cost_aware = CostAwareInteraction(
prefer_free=True,
auto_fallback=True
)
# Explicit model choice when needed
from models import GPT35Turbo
premium_interaction = AIInteraction(GPT35Turbo()) # Shows warningCreate a .env file in the project root:
OPENROUTER_API_KEY="sk-or-v1-your-openrouter-api-key-here"Edit default_prompts.yaml to customize system prompts:
prompts:
custom_prompt:
role: "system"
content: "You are a specialized assistant for my domain."
code_reviewer:
role: "system"
content: "You are an expert code reviewer. Analyze code for best practices."Add new models in models.py:
class NewFreeModel(AIModel):
@property
def model_name(self) -> str:
return "provider/new-model:free"
@property
def display_name(self) -> str:
return "New Free Model"
@property
def cost_tier(self) -> CostTier:
return CostTier.FREEMain interaction class with cost awareness.
class AIInteraction:
def __init__(self, model=None, prompt_manager=None, client=None, warn_on_paid=True)
def generate_response(self, user_prompt, default_prompt_name=None, **kwargs) -> str
def chat(self, conversation, **kwargs) -> str
def switch_model(self, new_model) -> None
def get_model_info(self) -> Dict[str, Any]
def get_available_prompts(self) -> List[str]Free-only interaction class for development.
class FreeModelInteraction(AIInteraction):
def __init__(self, model=None, **kwargs)
# Inherits all AIInteraction methods
# Rejects paid models with ValueErrorSmart cost management with fallbacks.
class CostAwareInteraction:
def __init__(self, prefer_free=True, auto_fallback=True)
def generate_response(self, user_prompt, force_free=False, force_paid=False, **kwargs) -> str
def switch_to_free(self) -> None
def switch_to_paid(self) -> None
def get_current_model_info(self) -> Dict[str, Any]# Quick responses
def quick_free_response(user_prompt, default_prompt_name=None) -> str
# Model comparison
def compare_free_vs_paid(user_prompt, default_prompt_name=None) -> Dict[str, str]
# Model information
def list_model_costs() -> Dict[str, List[str]]
# Model utilities (from models.py)
def get_default_model(prefer_free=True) -> AIModel
def get_free_models() -> List[AIModel]
def get_paid_models() -> List[AIModel]
def is_model_free(model_name: str) -> boolAll models implement the AIModel interface:
@property
def model_name(self) -> str # OpenRouter API model name
def display_name(self) -> str # Human-readable name
def cost_tier(self) -> CostTier # FREE or PAID
def is_free(self) -> bool # True if free model
def is_paid(self) -> bool # True if paid model
def default_temperature(self) -> float # Default sampling temperature
def default_max_tokens(self) -> Optional[int] # Default max tokensComprehensive logging system with request/response tracking:
from client import OpenRouterClient
# Enable logging
client = OpenRouterClient(enable_logging=True)
# Make requests (automatically logged)
response = client.send_request(model_name, messages)
# Get call history with timing and token usage
history = client.get_call_history()
for call in history:
print(f"Status: {call['status']}, Duration: {call['duration']:.3f}s")
print(f"Tokens: {call.get('usage', {}).get('total_tokens', 0)}")
# Clear history
client.clear_call_history()Intelligent caching system with TTL and LRU eviction:
from caching import CacheConfig, InMemoryCache
from client import OpenRouterClient
# Configure caching
cache_config = CacheConfig(
enabled=True,
max_size=1000, # Maximum cache entries
ttl_seconds=3600, # 1 hour TTL
persist_to_disk=True, # Save cache to disk
cache_dir=".cache" # Cache directory
)
# Create client with caching
client = OpenRouterClient(cache_config=cache_config)
# First request (cached)
response1 = client.send_request(model_name, messages, use_cache=True)
# Second request (uses cache - much faster)
response2 = client.send_request(model_name, messages, use_cache=True)
# Cache management
stats = client.get_cache_stats()
print(f"Cache hits: {stats['hits']}, misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")
client.clear_cache()
client.set_cache_enabled(False) # Disable cachingReal-time streaming responses with Server-Sent Events:
from framework import stream_to_console
from client import OpenRouterClient
# Stream to console with real-time display
result = stream_to_console(
"Tell me a story about a robot",
show_chunks=True # Shows chunk boundaries
)
# Manual streaming for custom handling
client = OpenRouterClient()
for chunk in client.send_streaming_request(model_name, messages):
content = chunk.get('content', '')
full_content = chunk.get('full_content', '')
if content:
print(content, end='', flush=True)
# Custom logic for each chunk
if len(full_content) > 100:
break # Stop after 100 charactersFull asynchronous support with concurrent operations:
import asyncio
from async_framework import (
AsyncAIInteraction, AsyncOpenRouterClient,
quick_async_free_response, async_stream_to_console
)
async def main():
# Basic async interaction
async with AsyncAIInteraction() as interaction:
response = await interaction.generate_response("Question")
# Quick async responses
response = await quick_async_free_response("Quick question")
# Async streaming
result = await async_stream_to_console("Stream question")
# Concurrent requests for faster processing
tasks = [
quick_async_free_response("Question 1"),
quick_async_free_response("Question 2"),
quick_async_free_response("Question 3")
]
# All requests run concurrently
results = await asyncio.gather(*tasks)
print(f"Processed {len(results)} requests concurrently")
# Async client for advanced usage
async with AsyncOpenRouterClient() as client:
response = await client.send_request(model_name, messages)
# Async streaming
async for chunk in client.send_streaming_request(model_name, messages):
print(chunk['content'], end='', flush=True)
# Run async code
asyncio.run(main())Live model data fetching from OpenRouter API:
from models import DynamicModelManager
# Create model manager
manager = DynamicModelManager()
# Fetch live model data (300+ models)
models_data = manager.fetch_models_from_api()
print(f"Found {len(models_data['data'])} models")
# Create dynamic model classes
dynamic_models = []
for model_data in models_data['data'][:10]: # First 10 models
ModelClass = manager._create_dynamic_model_class(model_data)
model_instance = ModelClass()
dynamic_models.append(model_instance)
print(f"{model_instance.display_name}: {model_instance.cost_tier.value}")
# Use dynamic models with framework
from framework import AIInteraction
if dynamic_models:
free_models = [m for m in dynamic_models if m.is_free]
if free_models:
interaction = AIInteraction(free_models[0])
response = interaction.generate_response("Test with dynamic model")All features work seamlessly together:
import asyncio
from caching import CacheConfig
from async_client import AsyncOpenRouterClient
from models import get_default_model
async def integrated_example():
# Combine all features
cache_config = CacheConfig(
enabled=True,
max_size=500,
ttl_seconds=1800,
persist_to_disk=True
)
# Async client with caching and logging
async with AsyncOpenRouterClient(enable_logging=True) as client:
model = get_default_model(prefer_free=True)
messages = [{"role": "user", "content": "Integrated features test"}]
# Request with caching (first time)
response1 = await client.send_request(model.model_name, messages)
# Request with caching (cached response - faster)
response2 = await client.send_request(model.model_name, messages)
# Async streaming with logging
full_response = ""
async for chunk in client.send_streaming_request(model.model_name, messages):
content = chunk.get('content', '')
if content:
full_response += content
print(content, end='', flush=True)
# Check call history and cache stats
history = client.get_call_history()
print(f"\nProcessed {len(history)} requests")
asyncio.run(integrated_example())Create specialized interaction classes:
from framework import AIInteraction
from models import get_default_model
class CodeReviewInteraction(AIInteraction):
def __init__(self):
super().__init__(get_default_model(prefer_free=True))
def review_code(self, code: str) -> str:
return self.generate_response(
f"Review this code:\n\n{code}",
default_prompt_name="code_generator"
)
# Usage
reviewer = CodeReviewInteraction()
feedback = reviewer.review_code("def factorial(n): return n * factorial(n-1)")from framework import MultiModelInteraction
from models import get_free_models, get_paid_models
# Create multi-model interaction
multi = MultiModelInteraction(
models=get_free_models()[:3], # Use 3 free models
warn_on_paid=False
)
# Get different perspectives
question = "What are the pros and cons of microservices?"
for model_name in multi.get_available_models():
response = multi.generate_response(model_name, question)
print(f"{model_name}: {response[:100]}...")from framework import CostAwareInteraction
import time
def robust_generate_response(prompt: str, max_retries: int = 3) -> str:
interaction = CostAwareInteraction(prefer_free=True, auto_fallback=True)
for attempt in range(max_retries):
try:
return interaction.generate_response(prompt)
except Exception as e:
if attempt == max_retries - 1:
raise e
time.sleep(2 ** attempt) # Exponential backoff
raise Exception("All retry attempts failed")
# Usage
response = robust_generate_response("Explain Python decorators")from prompts import PromptManager
# Load custom prompts
pm = PromptManager("custom_prompts.yaml")
# Create dynamic prompts
def create_domain_prompt(domain: str) -> dict:
return {
"role": "system",
"content": f"You are an expert in {domain}. Provide detailed, accurate information."
}
# Use with interaction
from framework import FreeModelInteraction
interaction = FreeModelInteraction()
# Add custom prompt dynamically
prompt = create_domain_prompt("machine learning")
messages = [prompt, {"role": "user", "content": "Explain neural networks"}]
response = interaction.chat(messages)Error: OPENROUTER_API_KEY not found in environment variables
Solution: Create .env file with your OpenRouter API key:
OPENROUTER_API_KEY="sk-or-v1-your-key-here"Error: 404 Client Error: Not Found for url: https://openrouter.ai/api/v1/chat/completions
Possible causes:
- Invalid API key
- Model name not available
- Network connectivity issues
Solution: Test with a known working model:
from framework import AIInteraction
from models import GPT35Turbo
# Test with GPT-3.5 Turbo (reliable paid model)
interaction = AIInteraction(GPT35Turbo(), warn_on_paid=False)
response = interaction.generate_response("Hello!")ValueError: FreeModelInteraction only accepts free models
Solution: Use only free models with FreeModelInteraction:
from framework import FreeModelInteraction
from models import Llama33_8B_Free
# Correct usage
interaction = FreeModelInteraction(Llama33_8B_Free())ModuleNotFoundError: No module named 'yaml'
Solution: Install required dependencies:
pip install python-dotenv requests PyYAMLEnable debug information:
import logging
logging.basicConfig(level=logging.DEBUG)
from framework import AIInteraction
interaction = AIInteraction()
# Will show detailed debug informationTest your setup:
from framework import AIInteraction
from models import GPT35Turbo
def test_connectivity():
try:
interaction = AIInteraction(GPT35Turbo(), warn_on_paid=False)
response = interaction.generate_response("Say 'Connection successful!'")
print(f"β
Success: {response}")
return True
except Exception as e:
print(f"β Failed: {e}")
return False
# Run test
test_connectivity()For high-throughput applications:
from framework import OpenRouterClient
from models import Llama33_8B_Free
import asyncio
# Reuse client instance
client = OpenRouterClient()
model = Llama33_8B_Free()
# Batch processing
def process_batch(prompts: list) -> list:
responses = []
for prompt in prompts:
messages = [{"role": "user", "content": prompt}]
response = client.send_request(model.model_name, messages)
content = client.get_response_content(response)
responses.append(content)
return responses- Research the model on OpenRouter
- Determine cost tier (free or paid)
- Add model class in
models.py:class NewModel(AIModel): @property def model_name(self) -> str: return "provider/model-name" @property def display_name(self) -> str: return "New Model Display Name" @property def cost_tier(self) -> CostTier: return CostTier.FREE # or CostTier.PAID
- Add to model collections in
models.py - Test the model with the framework
- Follow existing patterns in the codebase
- Maintain cost awareness for new features
- Add comprehensive tests for new functionality
- Update documentation in README.md and CLAUDE.md
- Use type hints for all functions
- Follow existing naming conventions
- Add docstrings for all public methods
- Maintain backward compatibility
This project is provided as-is for educational and development purposes. Please ensure compliance with OpenRouter's terms of service when using their API.
For issues and questions:
- Check this README for common solutions
- Review the troubleshooting section above
- Test with known working models (GPT-3.5 Turbo)
- Verify your API key and network connectivity
- Check OpenRouter status and model availability
The OpenRouter Framework provides a robust, cost-aware solution for AI model interactions. With intelligent free model prioritization, comprehensive error handling, and flexible interaction patterns, it's designed to minimize costs while maximizing functionality.
Key Benefits:
- π Cost-effective: Defaults to free models
- π‘οΈ Safe: Prevents accidental charges
- π Flexible: Multiple interaction patterns
- π Comprehensive: 23 models supported
- β‘ Easy: Simple setup and usage
Start with quick_free_response() for immediate results, then explore the advanced features as your needs grow!