A comprehensive, production-ready Python framework for leveraging local Large Language Models through Ollama. This framework transforms your local LLM into a powerful, versatile AI assistant with multiple interaction modes, advanced features, and a beautiful command-line interface.
๐ฏ Multiple AI Interaction Modes - CLI commands, interactive chat, document analysis, batch processing
๐ Smart Model Management - Seamless switching between models with persistent configuration
๐ Advanced Template System - Reusable prompt templates with variable substitution
๐ฌ Conversation Intelligence - Persistent chat sessions with full history management
โก Real-time Streaming - Live response generation with beautiful terminal output
๐๏ธ Extensible Architecture - Modular design perfect for building custom AI applications
๐ Rich CLI Experience - Tables, panels, progress bars, and syntax highlighting
๐ ๏ธ Developer-Friendly - Clean APIs, comprehensive examples, and detailed documentation
๐งช Production-Ready - Comprehensive testing suite with 100% success rate
๐ Deployment Validated - Full production readiness validation with 9 comprehensive checks
- Direct Generation: Single-shot text generation with full parameter control
- Interactive Chat: Full conversational interface with commands and history
- Document Analysis: AI-powered document Q&A with chunking for large files
- Batch Processing: Efficient processing of multiple prompts with progress tracking
- Code Assistant: Specialized code analysis, review, and generation capabilities
- Model Switching: Easy switching between all available Ollama models
- Template Engine: Powerful prompt templates with
${variable}substitution - Conversation Management: Save, load, and manage multiple conversation sessions
- Configuration System: Persistent settings with JSON storage and CLI management
- Streaming Support: Real-time response streaming for better user experience
- Error Handling: Graceful error recovery with informative user messages
- Rich Output: Beautiful terminal formatting with syntax highlighting
- Automated Testing: Comprehensive test suite with 7 core functionality tests
- Deployment Validation: Production readiness validation with 9 deployment checks
- Python 3.8 or higher
- Ollama installed and running locally
- At least one model installed in Ollama (e.g.,
ollama pull deepseek-r1)
-
Clone or download this framework to your local machine
-
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Verify Ollama is running:
python3 main.py health
# 1. Create virtual environment
python3 -m venv venv && source venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Verify everything works
python3 main.py health# See what models you have available
python3 main.py models
# Generate your first response
python3 main.py generate "Explain quantum computing in simple terms"
# Start an interactive chat session
python3 main.py chat
# Analyze a document (try it with the included LLM_ideas.md)
python3 main.py analyze LLM_ideas.md --question "What are the main project categories?"# Run the comprehensive demo
PYTHONPATH=. python3 examples/basic_usage.py
# Try the document Q&A system
PYTHONPATH=. python3 examples/document_qa.py
# Explore the code assistant
PYTHONPATH=. python3 examples/code_assistant.py| Command | Description | Example |
|---|---|---|
health |
Check system connectivity and model availability | python3 main.py health |
models |
List all available models with details | python3 main.py models |
set-model <name> |
Set the default model for all operations | python3 main.py set-model qwen3:latest |
config |
Display current configuration settings | python3 main.py config |
generate <prompt> |
Generate text from a single prompt | python3 main.py generate "Write a haiku" |
chat |
Start interactive conversational mode | python3 main.py chat |
analyze <file> |
Analyze documents with AI-powered insights | python3 main.py analyze report.pdf |
batch <file> |
Process multiple prompts efficiently | python3 main.py batch prompts.json |
| Command | Description | Example |
|---|---|---|
templates list [--category] |
List available prompt templates | python3 main.py templates list |
templates show <name> |
Show template details and variables | python3 main.py templates show summarize |
templates create <name> |
Create a new custom template | python3 main.py templates create my_template |
| Command | Description | Example |
|---|---|---|
conversations list |
List all saved conversations | python3 main.py conversations list |
conversations show <id> |
Display conversation details and history | python3 main.py conversations show abc123 |
conversations delete <id> |
Delete a specific conversation | python3 main.py conversations delete abc123 |
All generation commands support these powerful options:
| Option | Description | Example |
|---|---|---|
--model <name> |
Use a specific model | --model qwen3:latest |
--temperature <value> |
Control randomness (0.0-1.0) | --temperature 0.3 |
--max-tokens <number> |
Limit response length | --max-tokens 500 |
--stream/--no-stream |
Enable real-time streaming | --stream |
--system <prompt> |
Set system behavior | --system "You are a helpful coding assistant" |
# Generate clean, efficient code with specific parameters
python3 main.py generate "Write a Python function to calculate fibonacci numbers" \
--model qwen3:latest \
--temperature 0.3 \
--system "You are an expert Python developer who writes clean, efficient code."
# Creative writing with higher temperature
python3 main.py generate "Write a short story about AI discovering emotions" \
--temperature 0.8 \
--max-tokens 800# Use built-in templates with interactive prompts
python3 main.py generate --template explain_code
# Will prompt for: language, code
python3 main.py generate --template summarize
# Will prompt for: content_type, content
# Create your own template
python3 main.py templates create story_generator
# Then use: python3 main.py generate --template story_generator1. Create input file (research_prompts.json):
[
{
"prompt": "Explain the latest developments in quantum computing",
"metadata": {"category": "quantum", "priority": "high"}
},
{
"prompt": "What are the environmental impacts of renewable energy?",
"metadata": {"category": "environment", "priority": "medium"}
},
{
"prompt": "How does machine learning impact healthcare?",
"metadata": {"category": "AI", "priority": "high"}
}
]2. Process with custom settings:
python3 main.py batch research_prompts.json \
--output research_results.json \
--model deepseek-r1:latest \
--batch-size 3# Basic document analysis
python3 main.py analyze annual_report.pdf
# Targeted questions
python3 main.py analyze research_paper.pdf \
--question "What are the main findings and their implications?"
# Use specific analysis template
python3 main.py analyze code_file.py \
--template code_reviewer \
--model qwen3:latestStart with python3 main.py chat and use these powerful commands:
| Command | Description | Example |
|---|---|---|
/help |
Show all available commands | /help |
/model <name> |
Switch to different model mid-conversation | /model qwen3:latest |
/system <prompt> |
Change system behavior | /system "You are a creative writing assistant" |
/save |
Save current conversation | /save |
/load <id> |
Load previous conversation | /load abc123 |
/clear |
Clear conversation history | /clear |
/history |
Show conversation summary | /history |
/list |
List available models | /list |
/templates |
Show system prompt templates | /templates |
/exit or /quit |
Exit chat mode | /quit |
# Start a coding session
python3 main.py chat --system "You are an expert Python developer"
# Load a previous conversation
python3 main.py chat --load conversation_id
# Start with specific model
python3 main.py chat --model qwen3:latestThe framework uses a JSON configuration file (config/settings.json):
{
"base_url": "http://localhost:11434",
"default_model": "deepseek-r1:latest",
"default_temperature": 0.7,
"default_max_tokens": null,
"default_system_prompt": null,
"conversation_history_limit": 100,
"auto_save_conversations": true,
"prompt_templates_dir": "templates",
"conversations_dir": "conversations"
}You can modify these settings directly or use the CLI to change some values:
python3 main.py set-model qwen3:latestThe framework includes a powerful template system with variable substitution:
System Prompts (templates/system_prompts/):
assistant- General purpose assistantcode_reviewer- Expert code reviewerdocument_analyzer- Document analysis expert
User Prompts (templates/user_prompts/):
summarize- Content summarizationexplain_code- Code explanationtranslate- Language translation
Templates use ${variable} syntax for substitution:
python3 main.py templates create my_templateThen enter your template:
Analyze this ${content_type} and focus on ${aspect}:
${content}
Please provide ${detail_level} analysis.
python3 main.py generate --template my_template
# Will prompt for: content_type, aspect, content, detail_levelThe framework provides clean, intuitive APIs for building custom applications:
from src.llm_client import LLMClient
from src.prompt_manager import PromptManager
from src.conversation import ConversationManager
# Initialize components
client = LLMClient()
prompts = PromptManager()
conversations = ConversationManager()
# Simple text generation
response = client.generate("Explain machine learning in simple terms")
print(response.response)
print(f"Generated {response.eval_count} tokens in {response.total_duration/1e9:.2f}s")Template-Powered Generation:
# Load and use templates
template = prompts.load_template("summarize")
prompt = template.render(
content_type="research paper",
content="Your document content here..."
)
response = client.generate(prompt, temperature=0.3)Conversation Management:
# Create persistent conversations
conversation = conversations.create_conversation("AI Research Discussion")
conversation.add_message("user", "What are the latest AI breakthroughs?")
# Get formatted messages for API
messages = conversation.get_messages()
response = client.chat(messages, model="deepseek-r1:latest")
# Add response and save
conversation.add_message("assistant", response['message']['content'])
conversations.save_conversation(conversation)Streaming Responses:
# Real-time streaming for better UX
print("AI Response: ", end="")
for chunk in client.generate("Write a creative story", stream=True):
print(chunk, end="", flush=True)
print()Custom Templates:
# Create and use custom templates
prompts.create_template(
name="code_analyzer",
template="Analyze this ${language} code for ${focus}:\n\n${code}",
description="Code analysis with customizable focus",
variables=["language", "focus", "code"]
)
template = prompts.load_template("code_analyzer")
prompt = template.render(
language="Python",
focus="performance optimizations",
code="def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"
)Document Q&A System:
from src.utils import read_file_safe, chunk_text
class DocumentAnalyzer:
def __init__(self):
self.client = LLMClient()
def analyze_document(self, file_path, question):
content = read_file_safe(file_path)
chunks = chunk_text(content, chunk_size=1500)
# Find relevant chunks (simplified)
relevant_content = "\n\n".join(chunks[:3])
prompt = f"""Answer this question based on the document:
Question: {question}
Document: {relevant_content}"""
return self.client.generate(prompt, temperature=0.2)
# Usage
analyzer = DocumentAnalyzer()
result = analyzer.analyze_document("report.txt", "What are the key findings?")Batch Processing System:
def process_prompts_batch(prompts_list, model="deepseek-r1:latest"):
client = LLMClient()
results = []
for i, prompt in enumerate(prompts_list):
print(f"Processing {i+1}/{len(prompts_list)}")
response = client.generate(prompt, model=model)
results.append({
"prompt": prompt,
"response": response.response,
"tokens": response.eval_count
})
return results
# Usage
prompts = ["Explain AI", "What is quantum computing?", "How does blockchain work?"]
results = process_prompts_batch(prompts)The framework includes three powerful example applications that demonstrate real-world usage:
A comprehensive walkthrough of all framework features:
PYTHONPATH=. python3 examples/basic_usage.pyWhat it demonstrates:
- โจ Text generation with different models
- ๐ Model switching and configuration
- ๐ Template creation and usage
- ๐ฌ Conversation management and persistence
- โก Streaming response handling
- โ๏ธ Configuration management
Build a sophisticated RAG-like document analysis system:
# Analyze any document
PYTHONPATH=. python3 examples/document_qa.py path/to/your/document.pdf
# Or use the included example
PYTHONPATH=. python3 examples/document_qa.pyAdvanced features:
- ๐ Smart Document Processing - Automatic chunking for large documents
- ๐ฏ Intelligent Q&A - Relevant chunk selection for accurate answers
- ๐ Auto-Summarization - Generate comprehensive document summaries
- ๐ Key Point Extraction - Identify and list important insights
- ๐ญ Interactive Mode - Real-time Q&A with your documents
- ๐ง Context Awareness - Maintains document context across questions
Professional-grade code analysis and generation tool:
PYTHONPATH=. python3 examples/code_assistant.pyPowerful capabilities:
- ๐ Code Analysis - Deep analysis of code files with language detection
- ๐ Code Explanation - Step-by-step breakdowns of complex code
- ๐ฌ Code Review - Professional code review with specific recommendations
- โก Performance Optimization - Identify and fix performance bottlenecks
- ๐ Debug Assistant - Help identify and fix bugs with detailed guidance
- ๐งช Test Generation - Create comprehensive unit tests automatically
- ๐ง Code Refactoring - Improve code structure and maintainability
- ๐ฎ Interactive Mode - Full-featured code assistant with command interface
Code Assistant Commands:
# In interactive mode:
explain # Explain how code works
review # Comprehensive code review
optimize # Optimize for performance/readability
debug # Debug assistance with issue description
generate # Generate code from requirements
test # Generate unit tests
refactor # Refactor for better structureThe framework is organized into several key modules:
src/llm_client.py- Core API client for Ollamasrc/config.py- Configuration managementsrc/prompt_manager.py- Template systemsrc/conversation.py- Chat history managementsrc/utils.py- Utility functionscli/- Command-line interfacesexamples/- Example applicationstest_framework.py- Automated testing suitedeployment_validation.py- Production readiness validation
The framework includes comprehensive automated testing with 100% success rate:
# Run all tests (recommended)
./run_tests.sh
# Or run manually
source venv/bin/activate
python3 test_framework.pyTest Coverage:
- โ API Components Initialization
- โ Configuration System
- โ Ollama Connectivity
- โ Text Generation
- โ Template System
- โ CLI Commands
- โ Document Analysis
Comprehensive validation for production readiness:
# Run deployment validation
source venv/bin/activate
python3 deployment_validation.pyValidation Coverage:
- โ Core Stability Testing
- โ Configuration Flexibility
- โ Custom Template Creation
- โ Conversation Persistence
- โ Framework Extensibility
- โ Error Handling Robustness
- โ Performance Characteristics
- โ Deployment Requirements
- โ Custom Application Development
Status: ๐ FRAMEWORK IS DEPLOYMENT READY!
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Check framework health
python3 main.py health# List available models
python3 main.py models
# Pull a new model
ollama pull deepseek-r1
# Set default model
python3 main.py set-model deepseek-r1:latestWhen running examples, use:
PYTHONPATH=. python3 examples/script_name.pyIf you get import errors, make sure your virtual environment is activated:
source venv/bin/activate # On Windows: venv\Scripts\activateIf you encounter issues, run the test suite to diagnose:
# Quick health check
python3 main.py health
# Full test suite
./run_tests.sh
# Deployment validation
python3 deployment_validation.py# For factual tasks and code generation
python3 main.py set-model deepseek-r1:latest # Better reasoning
python3 main.py generate "Explain quantum physics" --temperature 0.2
# For creative tasks and storytelling
python3 main.py set-model qwen3:latest # More creative
python3 main.py generate "Write a creative story" --temperature 0.8- 0.1-0.3: Code generation, factual Q&A, document analysis
- 0.4-0.6: General conversation, explanations, tutorials
- 0.7-0.9: Creative writing, brainstorming, artistic content
- 0.9+: Experimental/highly creative outputs
- Streaming for UX: Always use
--streamfor interactive applications - Batch Processing: Use
batchcommand for multiple prompts instead of individual calls - Context Management: Limit conversation history to avoid token overflow
- Model Caching: Ollama keeps models in memory - first call may be slower
- Chunking Strategy: For large documents, use smaller chunks (1000-2000 chars) for better relevance
# Check system status
python3 main.py health
# Monitor large batch jobs
python3 main.py batch large_file.json --batch-size 5 # Process in smaller batches
# Clear conversation history periodically
python3 main.py conversations list
python3 main.py conversations delete old_conversation_id- ๐ Vector Embeddings - Semantic search and similarity matching for documents
- ๐งฉ Plugin System - Load custom tools and integrations dynamically
- ๐ Web Interface - Beautiful web UI using FastAPI with real-time streaming
- ๐ค Multi-model Ensembles - Combine responses from multiple models for better results
- ๐จ Fine-tuning Support - Train specialized models for specific domains
- ๐ External Integrations - Web search, databases, APIs, and file systems
- ๐ค Voice Interface - Speech-to-text and text-to-speech capabilities
- ๐ Analytics Dashboard - Usage tracking, model performance, and cost analysis
- RAG Enhancement - Advanced retrieval with re-ranking and hybrid search
- Code Execution - Safe code execution environment for generated code
- Workflow Automation - Chain multiple AI operations together
- Team Collaboration - Shared conversations and templates
- Model Marketplace - Easy discovery and installation of new models
We welcome contributions! Here are some ways to help:
- Open issues with detailed descriptions
- Include system info (Python version, OS, Ollama version)
- Provide steps to reproduce problems
- Follow existing code style and patterns
- Add comprehensive docstrings and type hints
- Test with real Ollama instances
- Update documentation for new features
- Improve README examples and explanations
- Add use case tutorials and guides
- Translate documentation to other languages
This framework is provided as-is for educational and development purposes. Feel free to modify, extend, and use it for your projects.
Important Notes:
- Ensure compliance with your local LLM model licenses
- Respect rate limits and usage policies of Ollama
- Consider privacy and security when processing sensitive documents
- No warranty provided - use at your own risk
- Ollama - Amazing local LLM runtime that makes this all possible
- Rich - Beautiful terminal output and formatting
- Click - Elegant command-line interface framework
- Pydantic - Data validation and settings management
- Open Source LLM Community - For making local AI accessible to everyone
- Python Ecosystem - For providing excellent tools and libraries
- AI Research Community - For advancing the field of artificial intelligence
# Quick setup (takes 30 seconds)
git clone <repository> && cd local_llm
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Verify everything works
python3 main.py health
# Start your AI journey!
python3 main.py chatHappy coding with your local LLM! ๐คโจ
Transform your local AI into a powerful, versatile assistant today! ๐