🤖 Local LLM Framework

A comprehensive, production-ready Python framework for leveraging local Large Language Models through Ollama. This framework transforms your local LLM into a powerful, versatile AI assistant with multiple interaction modes, advanced features, and a beautiful command-line interface.

✨ Key Highlights

🎯 Multiple AI Interaction Modes - CLI commands, interactive chat, document analysis, batch processing
🔄 Smart Model Management - Seamless switching between models with persistent configuration
📝 Advanced Template System - Reusable prompt templates with variable substitution
💬 Conversation Intelligence - Persistent chat sessions with full history management
⚡ Real-time Streaming - Live response generation with beautiful terminal output
🏗️ Extensible Architecture - Modular design perfect for building custom AI applications
📊 Rich CLI Experience - Tables, panels, progress bars, and syntax highlighting
🛠️ Developer-Friendly - Clean APIs, comprehensive examples, and detailed documentation
🧪 Production-Ready - Comprehensive testing suite with 100% success rate
🚀 Deployment Validated - Full production readiness validation with 9 comprehensive checks

🚀 Core Features

🎮 Interaction Modes

Direct Generation: Single-shot text generation with full parameter control
Interactive Chat: Full conversational interface with commands and history
Document Analysis: AI-powered document Q&A with chunking for large files
Batch Processing: Efficient processing of multiple prompts with progress tracking
Code Assistant: Specialized code analysis, review, and generation capabilities

🔧 Advanced Capabilities

Model Switching: Easy switching between all available Ollama models
Template Engine: Powerful prompt templates with ${variable} substitution
Conversation Management: Save, load, and manage multiple conversation sessions
Configuration System: Persistent settings with JSON storage and CLI management
Streaming Support: Real-time response streaming for better user experience
Error Handling: Graceful error recovery with informative user messages
Rich Output: Beautiful terminal formatting with syntax highlighting
Automated Testing: Comprehensive test suite with 7 core functionality tests
Deployment Validation: Production readiness validation with 9 deployment checks

📋 Prerequisites

Python 3.8 or higher
Ollama installed and running locally
At least one model installed in Ollama (e.g., ollama pull deepseek-r1)

🛠️ Installation

Clone or download this framework to your local machine

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Verify Ollama is running:
```
python3 main.py health
```

🎯 Quick Start

⚡ Instant Setup (30 seconds)

# 1. Create virtual environment
python3 -m venv venv && source venv/bin/activate

# 2. Install dependencies  
pip install -r requirements.txt

# 3. Verify everything works
python3 main.py health

🚦 First Commands

# See what models you have available
python3 main.py models

# Generate your first response
python3 main.py generate "Explain quantum computing in simple terms"

# Start an interactive chat session
python3 main.py chat

# Analyze a document (try it with the included LLM_ideas.md)
python3 main.py analyze LLM_ideas.md --question "What are the main project categories?"

🎪 Try the Example Applications

# Run the comprehensive demo
PYTHONPATH=. python3 examples/basic_usage.py

# Try the document Q&A system
PYTHONPATH=. python3 examples/document_qa.py

# Explore the code assistant
PYTHONPATH=. python3 examples/code_assistant.py

📚 Complete Command Reference

🎛️ Core Commands

Command	Description	Example
`health`	Check system connectivity and model availability	`python3 main.py health`
`models`	List all available models with details	`python3 main.py models`
`set-model <name>`	Set the default model for all operations	`python3 main.py set-model qwen3:latest`
`config`	Display current configuration settings	`python3 main.py config`
`generate <prompt>`	Generate text from a single prompt	`python3 main.py generate "Write a haiku"`
`chat`	Start interactive conversational mode	`python3 main.py chat`
`analyze <file>`	Analyze documents with AI-powered insights	`python3 main.py analyze report.pdf`
`batch <file>`	Process multiple prompts efficiently	`python3 main.py batch prompts.json`

📝 Template Management

Command	Description	Example
`templates list [--category]`	List available prompt templates	`python3 main.py templates list`
`templates show <name>`	Show template details and variables	`python3 main.py templates show summarize`
`templates create <name>`	Create a new custom template	`python3 main.py templates create my_template`

💬 Conversation Management

Command	Description	Example
`conversations list`	List all saved conversations	`python3 main.py conversations list`
`conversations show <id>`	Display conversation details and history	`python3 main.py conversations show abc123`
`conversations delete <id>`	Delete a specific conversation	`python3 main.py conversations delete abc123`

⚙️ Advanced Options

All generation commands support these powerful options:

Option	Description	Example
`--model <name>`	Use a specific model	`--model qwen3:latest`
`--temperature <value>`	Control randomness (0.0-1.0)	`--temperature 0.3`
`--max-tokens <number>`	Limit response length	`--max-tokens 500`
`--stream/--no-stream`	Enable real-time streaming	`--stream`
`--system <prompt>`	Set system behavior	`--system "You are a helpful coding assistant"`

🎨 Real-World Examples

🧠 Advanced Text Generation

# Generate clean, efficient code with specific parameters
python3 main.py generate "Write a Python function to calculate fibonacci numbers" \
  --model qwen3:latest \
  --temperature 0.3 \
  --system "You are an expert Python developer who writes clean, efficient code."

# Creative writing with higher temperature
python3 main.py generate "Write a short story about AI discovering emotions" \
  --temperature 0.8 \
  --max-tokens 800

📝 Template-Powered Workflows

# Use built-in templates with interactive prompts
python3 main.py generate --template explain_code
# Will prompt for: language, code

python3 main.py generate --template summarize  
# Will prompt for: content_type, content

# Create your own template
python3 main.py templates create story_generator
# Then use: python3 main.py generate --template story_generator

📊 Batch Processing Workflows

1. Create input file (research_prompts.json):

[
  {
    "prompt": "Explain the latest developments in quantum computing",
    "metadata": {"category": "quantum", "priority": "high"}
  },
  {
    "prompt": "What are the environmental impacts of renewable energy?",
    "metadata": {"category": "environment", "priority": "medium"}
  },
  {
    "prompt": "How does machine learning impact healthcare?",
    "metadata": {"category": "AI", "priority": "high"}
  }
]

2. Process with custom settings:

python3 main.py batch research_prompts.json \
  --output research_results.json \
  --model deepseek-r1:latest \
  --batch-size 3

📋 Document Analysis Examples

# Basic document analysis
python3 main.py analyze annual_report.pdf

# Targeted questions
python3 main.py analyze research_paper.pdf \
  --question "What are the main findings and their implications?"

# Use specific analysis template
python3 main.py analyze code_file.py \
  --template code_reviewer \
  --model qwen3:latest

💬 Interactive Chat Mode

Start with python3 main.py chat and use these powerful commands:

Command	Description	Example
`/help`	Show all available commands	`/help`
`/model <name>`	Switch to different model mid-conversation	`/model qwen3:latest`
`/system <prompt>`	Change system behavior	`/system "You are a creative writing assistant"`
`/save`	Save current conversation	`/save`
`/load <id>`	Load previous conversation	`/load abc123`
`/clear`	Clear conversation history	`/clear`
`/history`	Show conversation summary	`/history`
`/list`	List available models	`/list`
`/templates`	Show system prompt templates	`/templates`
`/exit` or `/quit`	Exit chat mode	`/quit`

🎭 Chat Examples

# Start a coding session
python3 main.py chat --system "You are an expert Python developer"

# Load a previous conversation  
python3 main.py chat --load conversation_id

# Start with specific model
python3 main.py chat --model qwen3:latest

🔧 Configuration

The framework uses a JSON configuration file (config/settings.json):

{
  "base_url": "http://localhost:11434",
  "default_model": "deepseek-r1:latest",
  "default_temperature": 0.7,
  "default_max_tokens": null,
  "default_system_prompt": null,
  "conversation_history_limit": 100,
  "auto_save_conversations": true,
  "prompt_templates_dir": "templates",
  "conversations_dir": "conversations"
}

You can modify these settings directly or use the CLI to change some values:

python3 main.py set-model qwen3:latest

📝 Prompt Templates

The framework includes a powerful template system with variable substitution:

Default Templates

System Prompts (templates/system_prompts/):

assistant - General purpose assistant
code_reviewer - Expert code reviewer
document_analyzer - Document analysis expert

User Prompts (templates/user_prompts/):

summarize - Content summarization
explain_code - Code explanation
translate - Language translation

Creating Custom Templates

Templates use ${variable} syntax for substitution:

python3 main.py templates create my_template

Then enter your template:

Analyze this ${content_type} and focus on ${aspect}:

${content}

Please provide ${detail_level} analysis.

Using Templates

python3 main.py generate --template my_template
# Will prompt for: content_type, aspect, content, detail_level

🧑‍💻 Developer API & Programming Interface

The framework provides clean, intuitive APIs for building custom applications:

🚀 Quick Start API

from src.llm_client import LLMClient
from src.prompt_manager import PromptManager
from src.conversation import ConversationManager

# Initialize components
client = LLMClient()
prompts = PromptManager()
conversations = ConversationManager()

# Simple text generation
response = client.generate("Explain machine learning in simple terms")
print(response.response)
print(f"Generated {response.eval_count} tokens in {response.total_duration/1e9:.2f}s")

🎨 Advanced Usage Patterns

Template-Powered Generation:

# Load and use templates
template = prompts.load_template("summarize")
prompt = template.render(
    content_type="research paper", 
    content="Your document content here..."
)
response = client.generate(prompt, temperature=0.3)

Conversation Management:

# Create persistent conversations
conversation = conversations.create_conversation("AI Research Discussion")
conversation.add_message("user", "What are the latest AI breakthroughs?")

# Get formatted messages for API
messages = conversation.get_messages()
response = client.chat(messages, model="deepseek-r1:latest")

# Add response and save
conversation.add_message("assistant", response['message']['content'])
conversations.save_conversation(conversation)

Streaming Responses:

# Real-time streaming for better UX
print("AI Response: ", end="")
for chunk in client.generate("Write a creative story", stream=True):
    print(chunk, end="", flush=True)
print()

Custom Templates:

# Create and use custom templates
prompts.create_template(
    name="code_analyzer",
    template="Analyze this ${language} code for ${focus}:\n\n${code}",
    description="Code analysis with customizable focus",
    variables=["language", "focus", "code"]
)

template = prompts.load_template("code_analyzer")
prompt = template.render(
    language="Python",
    focus="performance optimizations", 
    code="def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"
)

🏗️ Building Applications

Document Q&A System:

from src.utils import read_file_safe, chunk_text

class DocumentAnalyzer:
    def __init__(self):
        self.client = LLMClient()
    
    def analyze_document(self, file_path, question):
        content = read_file_safe(file_path)
        chunks = chunk_text(content, chunk_size=1500)
        
        # Find relevant chunks (simplified)
        relevant_content = "\n\n".join(chunks[:3])
        
        prompt = f"""Answer this question based on the document:
        Question: {question}
        Document: {relevant_content}"""
        
        return self.client.generate(prompt, temperature=0.2)

# Usage
analyzer = DocumentAnalyzer()
result = analyzer.analyze_document("report.txt", "What are the key findings?")

Batch Processing System:

def process_prompts_batch(prompts_list, model="deepseek-r1:latest"):
    client = LLMClient()
    results = []
    
    for i, prompt in enumerate(prompts_list):
        print(f"Processing {i+1}/{len(prompts_list)}")
        response = client.generate(prompt, model=model)
        results.append({
            "prompt": prompt,
            "response": response.response,
            "tokens": response.eval_count
        })
    
    return results

# Usage
prompts = ["Explain AI", "What is quantum computing?", "How does blockchain work?"]
results = process_prompts_batch(prompts)

📖 Comprehensive Example Applications

The framework includes three powerful example applications that demonstrate real-world usage:

🎓 Basic Usage Demo (`examples/basic_usage.py`)

A comprehensive walkthrough of all framework features:

PYTHONPATH=. python3 examples/basic_usage.py

What it demonstrates:

✨ Text generation with different models
🔄 Model switching and configuration
📝 Template creation and usage
💬 Conversation management and persistence
⚡ Streaming response handling
⚙️ Configuration management

📚 Document Q&A System (`examples/document_qa.py`)

Build a sophisticated RAG-like document analysis system:

# Analyze any document
PYTHONPATH=. python3 examples/document_qa.py path/to/your/document.pdf

# Or use the included example
PYTHONPATH=. python3 examples/document_qa.py

Advanced features:

📄 Smart Document Processing - Automatic chunking for large documents
🎯 Intelligent Q&A - Relevant chunk selection for accurate answers
📋 Auto-Summarization - Generate comprehensive document summaries
🔍 Key Point Extraction - Identify and list important insights
💭 Interactive Mode - Real-time Q&A with your documents
🧠 Context Awareness - Maintains document context across questions

🧑‍💻 AI Code Assistant (`examples/code_assistant.py`)

Professional-grade code analysis and generation tool:

PYTHONPATH=. python3 examples/code_assistant.py

Powerful capabilities:

🔍 Code Analysis - Deep analysis of code files with language detection
📝 Code Explanation - Step-by-step breakdowns of complex code
🔬 Code Review - Professional code review with specific recommendations
⚡ Performance Optimization - Identify and fix performance bottlenecks
🐛 Debug Assistant - Help identify and fix bugs with detailed guidance
🧪 Test Generation - Create comprehensive unit tests automatically
🔧 Code Refactoring - Improve code structure and maintainability
🎮 Interactive Mode - Full-featured code assistant with command interface

Code Assistant Commands:

# In interactive mode:
explain    # Explain how code works
review     # Comprehensive code review
optimize   # Optimize for performance/readability
debug      # Debug assistance with issue description
generate   # Generate code from requirements
test       # Generate unit tests
refactor   # Refactor for better structure

🏗️ Architecture

The framework is organized into several key modules:

src/llm_client.py - Core API client for Ollama
src/config.py - Configuration management
src/prompt_manager.py - Template system
src/conversation.py - Chat history management
src/utils.py - Utility functions
cli/ - Command-line interfaces
examples/ - Example applications
test_framework.py - Automated testing suite
deployment_validation.py - Production readiness validation

🧪 Testing & Validation

Automated Testing Suite

The framework includes comprehensive automated testing with 100% success rate:

# Run all tests (recommended)
./run_tests.sh

# Or run manually
source venv/bin/activate
python3 test_framework.py

Test Coverage:

✅ API Components Initialization
✅ Configuration System
✅ Ollama Connectivity
✅ Text Generation
✅ Template System
✅ CLI Commands
✅ Document Analysis

Production Deployment Validation

Comprehensive validation for production readiness:

# Run deployment validation
source venv/bin/activate
python3 deployment_validation.py

Validation Coverage:

✅ Core Stability Testing
✅ Configuration Flexibility
✅ Custom Template Creation
✅ Conversation Persistence
✅ Framework Extensibility
✅ Error Handling Robustness
✅ Performance Characteristics
✅ Deployment Requirements
✅ Custom Application Development

Status: 🎉 FRAMEWORK IS DEPLOYMENT READY!

🔍 Troubleshooting

Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Check framework health
python3 main.py health

Model Issues

# List available models
python3 main.py models

# Pull a new model
ollama pull deepseek-r1

# Set default model
python3 main.py set-model deepseek-r1:latest

Import Issues

When running examples, use:

PYTHONPATH=. python3 examples/script_name.py

Virtual Environment

If you get import errors, make sure your virtual environment is activated:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Testing Issues

If you encounter issues, run the test suite to diagnose:

# Quick health check
python3 main.py health

# Full test suite
./run_tests.sh

# Deployment validation
python3 deployment_validation.py

⚡ Performance & Optimization Tips

🎯 Model Selection Strategy

# For factual tasks and code generation
python3 main.py set-model deepseek-r1:latest  # Better reasoning
python3 main.py generate "Explain quantum physics" --temperature 0.2

# For creative tasks and storytelling  
python3 main.py set-model qwen3:latest  # More creative
python3 main.py generate "Write a creative story" --temperature 0.8

🧠 Temperature Guidelines

0.1-0.3: Code generation, factual Q&A, document analysis
0.4-0.6: General conversation, explanations, tutorials
0.7-0.9: Creative writing, brainstorming, artistic content
0.9+: Experimental/highly creative outputs

🚀 Performance Optimization

Streaming for UX: Always use --stream for interactive applications
Batch Processing: Use batch command for multiple prompts instead of individual calls
Context Management: Limit conversation history to avoid token overflow
Model Caching: Ollama keeps models in memory - first call may be slower
Chunking Strategy: For large documents, use smaller chunks (1000-2000 chars) for better relevance

💾 Memory & Resource Management

# Check system status
python3 main.py health

# Monitor large batch jobs
python3 main.py batch large_file.json --batch-size 5  # Process in smaller batches

# Clear conversation history periodically
python3 main.py conversations list
python3 main.py conversations delete old_conversation_id

🛣️ Roadmap & Future Enhancements

🎯 Planned Features

🔍 Vector Embeddings - Semantic search and similarity matching for documents
🧩 Plugin System - Load custom tools and integrations dynamically
🌐 Web Interface - Beautiful web UI using FastAPI with real-time streaming
🤝 Multi-model Ensembles - Combine responses from multiple models for better results
🎨 Fine-tuning Support - Train specialized models for specific domains
🔌 External Integrations - Web search, databases, APIs, and file systems
🎤 Voice Interface - Speech-to-text and text-to-speech capabilities
📊 Analytics Dashboard - Usage tracking, model performance, and cost analysis

💡 Extension Ideas

RAG Enhancement - Advanced retrieval with re-ranking and hybrid search
Code Execution - Safe code execution environment for generated code
Workflow Automation - Chain multiple AI operations together
Team Collaboration - Shared conversations and templates
Model Marketplace - Easy discovery and installation of new models

🤝 Contributing

We welcome contributions! Here are some ways to help:

🐛 Bug Reports & Feature Requests

Open issues with detailed descriptions
Include system info (Python version, OS, Ollama version)
Provide steps to reproduce problems

🛠️ Code Contributions

Follow existing code style and patterns
Add comprehensive docstrings and type hints
Test with real Ollama instances
Update documentation for new features

📚 Documentation

Improve README examples and explanations
Add use case tutorials and guides
Translate documentation to other languages

📄 License & Legal

This framework is provided as-is for educational and development purposes. Feel free to modify, extend, and use it for your projects.

Important Notes:

Ensure compliance with your local LLM model licenses
Respect rate limits and usage policies of Ollama
Consider privacy and security when processing sensitive documents
No warranty provided - use at your own risk

🙏 Acknowledgments & Credits

🏆 Core Technologies

Ollama - Amazing local LLM runtime that makes this all possible
Rich - Beautiful terminal output and formatting
Click - Elegant command-line interface framework
Pydantic - Data validation and settings management

🌟 Inspiration & Community

Open Source LLM Community - For making local AI accessible to everyone
Python Ecosystem - For providing excellent tools and libraries
AI Research Community - For advancing the field of artificial intelligence

🚀 Ready to Get Started?

# Quick setup (takes 30 seconds)
git clone <repository> && cd local_llm
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Verify everything works
python3 main.py health

# Start your AI journey!
python3 main.py chat

Happy coding with your local LLM! 🤖✨

Transform your local AI into a powerful, versatile assistant today! 🚀

joshsisto/local-llm-framework