/local-llm-framework

A comprehensive Python framework for leveraging local Large Language Models through Ollama with multiple interaction modes and extensible architecture

Primary LanguagePython

๐Ÿค– Local LLM Framework

A comprehensive, production-ready Python framework for leveraging local Large Language Models through Ollama. This framework transforms your local LLM into a powerful, versatile AI assistant with multiple interaction modes, advanced features, and a beautiful command-line interface.

โœจ Key Highlights

๐ŸŽฏ Multiple AI Interaction Modes - CLI commands, interactive chat, document analysis, batch processing
๐Ÿ”„ Smart Model Management - Seamless switching between models with persistent configuration
๐Ÿ“ Advanced Template System - Reusable prompt templates with variable substitution
๐Ÿ’ฌ Conversation Intelligence - Persistent chat sessions with full history management
โšก Real-time Streaming - Live response generation with beautiful terminal output
๐Ÿ—๏ธ Extensible Architecture - Modular design perfect for building custom AI applications
๐Ÿ“Š Rich CLI Experience - Tables, panels, progress bars, and syntax highlighting
๐Ÿ› ๏ธ Developer-Friendly - Clean APIs, comprehensive examples, and detailed documentation
๐Ÿงช Production-Ready - Comprehensive testing suite with 100% success rate
๐Ÿš€ Deployment Validated - Full production readiness validation with 9 comprehensive checks

๐Ÿš€ Core Features

๐ŸŽฎ Interaction Modes

  • Direct Generation: Single-shot text generation with full parameter control
  • Interactive Chat: Full conversational interface with commands and history
  • Document Analysis: AI-powered document Q&A with chunking for large files
  • Batch Processing: Efficient processing of multiple prompts with progress tracking
  • Code Assistant: Specialized code analysis, review, and generation capabilities

๐Ÿ”ง Advanced Capabilities

  • Model Switching: Easy switching between all available Ollama models
  • Template Engine: Powerful prompt templates with ${variable} substitution
  • Conversation Management: Save, load, and manage multiple conversation sessions
  • Configuration System: Persistent settings with JSON storage and CLI management
  • Streaming Support: Real-time response streaming for better user experience
  • Error Handling: Graceful error recovery with informative user messages
  • Rich Output: Beautiful terminal formatting with syntax highlighting
  • Automated Testing: Comprehensive test suite with 7 core functionality tests
  • Deployment Validation: Production readiness validation with 9 deployment checks

๐Ÿ“‹ Prerequisites

  • Python 3.8 or higher
  • Ollama installed and running locally
  • At least one model installed in Ollama (e.g., ollama pull deepseek-r1)

๐Ÿ› ๏ธ Installation

  1. Clone or download this framework to your local machine

  2. Create a virtual environment (recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Verify Ollama is running:

    python3 main.py health

๐ŸŽฏ Quick Start

โšก Instant Setup (30 seconds)

# 1. Create virtual environment
python3 -m venv venv && source venv/bin/activate

# 2. Install dependencies  
pip install -r requirements.txt

# 3. Verify everything works
python3 main.py health

๐Ÿšฆ First Commands

# See what models you have available
python3 main.py models

# Generate your first response
python3 main.py generate "Explain quantum computing in simple terms"

# Start an interactive chat session
python3 main.py chat

# Analyze a document (try it with the included LLM_ideas.md)
python3 main.py analyze LLM_ideas.md --question "What are the main project categories?"

๐ŸŽช Try the Example Applications

# Run the comprehensive demo
PYTHONPATH=. python3 examples/basic_usage.py

# Try the document Q&A system
PYTHONPATH=. python3 examples/document_qa.py

# Explore the code assistant
PYTHONPATH=. python3 examples/code_assistant.py

๐Ÿ“š Complete Command Reference

๐ŸŽ›๏ธ Core Commands

Command Description Example
health Check system connectivity and model availability python3 main.py health
models List all available models with details python3 main.py models
set-model <name> Set the default model for all operations python3 main.py set-model qwen3:latest
config Display current configuration settings python3 main.py config
generate <prompt> Generate text from a single prompt python3 main.py generate "Write a haiku"
chat Start interactive conversational mode python3 main.py chat
analyze <file> Analyze documents with AI-powered insights python3 main.py analyze report.pdf
batch <file> Process multiple prompts efficiently python3 main.py batch prompts.json

๐Ÿ“ Template Management

Command Description Example
templates list [--category] List available prompt templates python3 main.py templates list
templates show <name> Show template details and variables python3 main.py templates show summarize
templates create <name> Create a new custom template python3 main.py templates create my_template

๐Ÿ’ฌ Conversation Management

Command Description Example
conversations list List all saved conversations python3 main.py conversations list
conversations show <id> Display conversation details and history python3 main.py conversations show abc123
conversations delete <id> Delete a specific conversation python3 main.py conversations delete abc123

โš™๏ธ Advanced Options

All generation commands support these powerful options:

Option Description Example
--model <name> Use a specific model --model qwen3:latest
--temperature <value> Control randomness (0.0-1.0) --temperature 0.3
--max-tokens <number> Limit response length --max-tokens 500
--stream/--no-stream Enable real-time streaming --stream
--system <prompt> Set system behavior --system "You are a helpful coding assistant"

๐ŸŽจ Real-World Examples

๐Ÿง  Advanced Text Generation

# Generate clean, efficient code with specific parameters
python3 main.py generate "Write a Python function to calculate fibonacci numbers" \
  --model qwen3:latest \
  --temperature 0.3 \
  --system "You are an expert Python developer who writes clean, efficient code."

# Creative writing with higher temperature
python3 main.py generate "Write a short story about AI discovering emotions" \
  --temperature 0.8 \
  --max-tokens 800

๐Ÿ“ Template-Powered Workflows

# Use built-in templates with interactive prompts
python3 main.py generate --template explain_code
# Will prompt for: language, code

python3 main.py generate --template summarize  
# Will prompt for: content_type, content

# Create your own template
python3 main.py templates create story_generator
# Then use: python3 main.py generate --template story_generator

๐Ÿ“Š Batch Processing Workflows

1. Create input file (research_prompts.json):

[
  {
    "prompt": "Explain the latest developments in quantum computing",
    "metadata": {"category": "quantum", "priority": "high"}
  },
  {
    "prompt": "What are the environmental impacts of renewable energy?",
    "metadata": {"category": "environment", "priority": "medium"}
  },
  {
    "prompt": "How does machine learning impact healthcare?",
    "metadata": {"category": "AI", "priority": "high"}
  }
]

2. Process with custom settings:

python3 main.py batch research_prompts.json \
  --output research_results.json \
  --model deepseek-r1:latest \
  --batch-size 3

๐Ÿ“‹ Document Analysis Examples

# Basic document analysis
python3 main.py analyze annual_report.pdf

# Targeted questions
python3 main.py analyze research_paper.pdf \
  --question "What are the main findings and their implications?"

# Use specific analysis template
python3 main.py analyze code_file.py \
  --template code_reviewer \
  --model qwen3:latest

๐Ÿ’ฌ Interactive Chat Mode

Start with python3 main.py chat and use these powerful commands:

Command Description Example
/help Show all available commands /help
/model <name> Switch to different model mid-conversation /model qwen3:latest
/system <prompt> Change system behavior /system "You are a creative writing assistant"
/save Save current conversation /save
/load <id> Load previous conversation /load abc123
/clear Clear conversation history /clear
/history Show conversation summary /history
/list List available models /list
/templates Show system prompt templates /templates
/exit or /quit Exit chat mode /quit

๐ŸŽญ Chat Examples

# Start a coding session
python3 main.py chat --system "You are an expert Python developer"

# Load a previous conversation  
python3 main.py chat --load conversation_id

# Start with specific model
python3 main.py chat --model qwen3:latest

๐Ÿ”ง Configuration

The framework uses a JSON configuration file (config/settings.json):

{
  "base_url": "http://localhost:11434",
  "default_model": "deepseek-r1:latest",
  "default_temperature": 0.7,
  "default_max_tokens": null,
  "default_system_prompt": null,
  "conversation_history_limit": 100,
  "auto_save_conversations": true,
  "prompt_templates_dir": "templates",
  "conversations_dir": "conversations"
}

You can modify these settings directly or use the CLI to change some values:

python3 main.py set-model qwen3:latest

๐Ÿ“ Prompt Templates

The framework includes a powerful template system with variable substitution:

Default Templates

System Prompts (templates/system_prompts/):

  • assistant - General purpose assistant
  • code_reviewer - Expert code reviewer
  • document_analyzer - Document analysis expert

User Prompts (templates/user_prompts/):

  • summarize - Content summarization
  • explain_code - Code explanation
  • translate - Language translation

Creating Custom Templates

Templates use ${variable} syntax for substitution:

python3 main.py templates create my_template

Then enter your template:

Analyze this ${content_type} and focus on ${aspect}:

${content}

Please provide ${detail_level} analysis.

Using Templates

python3 main.py generate --template my_template
# Will prompt for: content_type, aspect, content, detail_level

๐Ÿง‘โ€๐Ÿ’ป Developer API & Programming Interface

The framework provides clean, intuitive APIs for building custom applications:

๐Ÿš€ Quick Start API

from src.llm_client import LLMClient
from src.prompt_manager import PromptManager
from src.conversation import ConversationManager

# Initialize components
client = LLMClient()
prompts = PromptManager()
conversations = ConversationManager()

# Simple text generation
response = client.generate("Explain machine learning in simple terms")
print(response.response)
print(f"Generated {response.eval_count} tokens in {response.total_duration/1e9:.2f}s")

๐ŸŽจ Advanced Usage Patterns

Template-Powered Generation:

# Load and use templates
template = prompts.load_template("summarize")
prompt = template.render(
    content_type="research paper", 
    content="Your document content here..."
)
response = client.generate(prompt, temperature=0.3)

Conversation Management:

# Create persistent conversations
conversation = conversations.create_conversation("AI Research Discussion")
conversation.add_message("user", "What are the latest AI breakthroughs?")

# Get formatted messages for API
messages = conversation.get_messages()
response = client.chat(messages, model="deepseek-r1:latest")

# Add response and save
conversation.add_message("assistant", response['message']['content'])
conversations.save_conversation(conversation)

Streaming Responses:

# Real-time streaming for better UX
print("AI Response: ", end="")
for chunk in client.generate("Write a creative story", stream=True):
    print(chunk, end="", flush=True)
print()

Custom Templates:

# Create and use custom templates
prompts.create_template(
    name="code_analyzer",
    template="Analyze this ${language} code for ${focus}:\n\n${code}",
    description="Code analysis with customizable focus",
    variables=["language", "focus", "code"]
)

template = prompts.load_template("code_analyzer")
prompt = template.render(
    language="Python",
    focus="performance optimizations", 
    code="def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)"
)

๐Ÿ—๏ธ Building Applications

Document Q&A System:

from src.utils import read_file_safe, chunk_text

class DocumentAnalyzer:
    def __init__(self):
        self.client = LLMClient()
    
    def analyze_document(self, file_path, question):
        content = read_file_safe(file_path)
        chunks = chunk_text(content, chunk_size=1500)
        
        # Find relevant chunks (simplified)
        relevant_content = "\n\n".join(chunks[:3])
        
        prompt = f"""Answer this question based on the document:
        Question: {question}
        Document: {relevant_content}"""
        
        return self.client.generate(prompt, temperature=0.2)

# Usage
analyzer = DocumentAnalyzer()
result = analyzer.analyze_document("report.txt", "What are the key findings?")

Batch Processing System:

def process_prompts_batch(prompts_list, model="deepseek-r1:latest"):
    client = LLMClient()
    results = []
    
    for i, prompt in enumerate(prompts_list):
        print(f"Processing {i+1}/{len(prompts_list)}")
        response = client.generate(prompt, model=model)
        results.append({
            "prompt": prompt,
            "response": response.response,
            "tokens": response.eval_count
        })
    
    return results

# Usage
prompts = ["Explain AI", "What is quantum computing?", "How does blockchain work?"]
results = process_prompts_batch(prompts)

๐Ÿ“– Comprehensive Example Applications

The framework includes three powerful example applications that demonstrate real-world usage:

๐ŸŽ“ Basic Usage Demo (examples/basic_usage.py)

A comprehensive walkthrough of all framework features:

PYTHONPATH=. python3 examples/basic_usage.py

What it demonstrates:

  • โœจ Text generation with different models
  • ๐Ÿ”„ Model switching and configuration
  • ๐Ÿ“ Template creation and usage
  • ๐Ÿ’ฌ Conversation management and persistence
  • โšก Streaming response handling
  • โš™๏ธ Configuration management

๐Ÿ“š Document Q&A System (examples/document_qa.py)

Build a sophisticated RAG-like document analysis system:

# Analyze any document
PYTHONPATH=. python3 examples/document_qa.py path/to/your/document.pdf

# Or use the included example
PYTHONPATH=. python3 examples/document_qa.py

Advanced features:

  • ๐Ÿ“„ Smart Document Processing - Automatic chunking for large documents
  • ๐ŸŽฏ Intelligent Q&A - Relevant chunk selection for accurate answers
  • ๐Ÿ“‹ Auto-Summarization - Generate comprehensive document summaries
  • ๐Ÿ” Key Point Extraction - Identify and list important insights
  • ๐Ÿ’ญ Interactive Mode - Real-time Q&A with your documents
  • ๐Ÿง  Context Awareness - Maintains document context across questions

๐Ÿง‘โ€๐Ÿ’ป AI Code Assistant (examples/code_assistant.py)

Professional-grade code analysis and generation tool:

PYTHONPATH=. python3 examples/code_assistant.py

Powerful capabilities:

  • ๐Ÿ” Code Analysis - Deep analysis of code files with language detection
  • ๐Ÿ“ Code Explanation - Step-by-step breakdowns of complex code
  • ๐Ÿ”ฌ Code Review - Professional code review with specific recommendations
  • โšก Performance Optimization - Identify and fix performance bottlenecks
  • ๐Ÿ› Debug Assistant - Help identify and fix bugs with detailed guidance
  • ๐Ÿงช Test Generation - Create comprehensive unit tests automatically
  • ๐Ÿ”ง Code Refactoring - Improve code structure and maintainability
  • ๐ŸŽฎ Interactive Mode - Full-featured code assistant with command interface

Code Assistant Commands:

# In interactive mode:
explain    # Explain how code works
review     # Comprehensive code review
optimize   # Optimize for performance/readability
debug      # Debug assistance with issue description
generate   # Generate code from requirements
test       # Generate unit tests
refactor   # Refactor for better structure

๐Ÿ—๏ธ Architecture

The framework is organized into several key modules:

  • src/llm_client.py - Core API client for Ollama
  • src/config.py - Configuration management
  • src/prompt_manager.py - Template system
  • src/conversation.py - Chat history management
  • src/utils.py - Utility functions
  • cli/ - Command-line interfaces
  • examples/ - Example applications
  • test_framework.py - Automated testing suite
  • deployment_validation.py - Production readiness validation

๐Ÿงช Testing & Validation

Automated Testing Suite

The framework includes comprehensive automated testing with 100% success rate:

# Run all tests (recommended)
./run_tests.sh

# Or run manually
source venv/bin/activate
python3 test_framework.py

Test Coverage:

  • โœ… API Components Initialization
  • โœ… Configuration System
  • โœ… Ollama Connectivity
  • โœ… Text Generation
  • โœ… Template System
  • โœ… CLI Commands
  • โœ… Document Analysis

Production Deployment Validation

Comprehensive validation for production readiness:

# Run deployment validation
source venv/bin/activate
python3 deployment_validation.py

Validation Coverage:

  • โœ… Core Stability Testing
  • โœ… Configuration Flexibility
  • โœ… Custom Template Creation
  • โœ… Conversation Persistence
  • โœ… Framework Extensibility
  • โœ… Error Handling Robustness
  • โœ… Performance Characteristics
  • โœ… Deployment Requirements
  • โœ… Custom Application Development

Status: ๐ŸŽ‰ FRAMEWORK IS DEPLOYMENT READY!

๐Ÿ” Troubleshooting

Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Check framework health
python3 main.py health

Model Issues

# List available models
python3 main.py models

# Pull a new model
ollama pull deepseek-r1

# Set default model
python3 main.py set-model deepseek-r1:latest

Import Issues

When running examples, use:

PYTHONPATH=. python3 examples/script_name.py

Virtual Environment

If you get import errors, make sure your virtual environment is activated:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Testing Issues

If you encounter issues, run the test suite to diagnose:

# Quick health check
python3 main.py health

# Full test suite
./run_tests.sh

# Deployment validation
python3 deployment_validation.py

โšก Performance & Optimization Tips

๐ŸŽฏ Model Selection Strategy

# For factual tasks and code generation
python3 main.py set-model deepseek-r1:latest  # Better reasoning
python3 main.py generate "Explain quantum physics" --temperature 0.2

# For creative tasks and storytelling  
python3 main.py set-model qwen3:latest  # More creative
python3 main.py generate "Write a creative story" --temperature 0.8

๐Ÿง  Temperature Guidelines

  • 0.1-0.3: Code generation, factual Q&A, document analysis
  • 0.4-0.6: General conversation, explanations, tutorials
  • 0.7-0.9: Creative writing, brainstorming, artistic content
  • 0.9+: Experimental/highly creative outputs

๐Ÿš€ Performance Optimization

  1. Streaming for UX: Always use --stream for interactive applications
  2. Batch Processing: Use batch command for multiple prompts instead of individual calls
  3. Context Management: Limit conversation history to avoid token overflow
  4. Model Caching: Ollama keeps models in memory - first call may be slower
  5. Chunking Strategy: For large documents, use smaller chunks (1000-2000 chars) for better relevance

๐Ÿ’พ Memory & Resource Management

# Check system status
python3 main.py health

# Monitor large batch jobs
python3 main.py batch large_file.json --batch-size 5  # Process in smaller batches

# Clear conversation history periodically
python3 main.py conversations list
python3 main.py conversations delete old_conversation_id

๐Ÿ›ฃ๏ธ Roadmap & Future Enhancements

๐ŸŽฏ Planned Features

  • ๐Ÿ” Vector Embeddings - Semantic search and similarity matching for documents
  • ๐Ÿงฉ Plugin System - Load custom tools and integrations dynamically
  • ๐ŸŒ Web Interface - Beautiful web UI using FastAPI with real-time streaming
  • ๐Ÿค Multi-model Ensembles - Combine responses from multiple models for better results
  • ๐ŸŽจ Fine-tuning Support - Train specialized models for specific domains
  • ๐Ÿ”Œ External Integrations - Web search, databases, APIs, and file systems
  • ๐ŸŽค Voice Interface - Speech-to-text and text-to-speech capabilities
  • ๐Ÿ“Š Analytics Dashboard - Usage tracking, model performance, and cost analysis

๐Ÿ’ก Extension Ideas

  • RAG Enhancement - Advanced retrieval with re-ranking and hybrid search
  • Code Execution - Safe code execution environment for generated code
  • Workflow Automation - Chain multiple AI operations together
  • Team Collaboration - Shared conversations and templates
  • Model Marketplace - Easy discovery and installation of new models

๐Ÿค Contributing

We welcome contributions! Here are some ways to help:

๐Ÿ› Bug Reports & Feature Requests

  • Open issues with detailed descriptions
  • Include system info (Python version, OS, Ollama version)
  • Provide steps to reproduce problems

๐Ÿ› ๏ธ Code Contributions

  • Follow existing code style and patterns
  • Add comprehensive docstrings and type hints
  • Test with real Ollama instances
  • Update documentation for new features

๐Ÿ“š Documentation

  • Improve README examples and explanations
  • Add use case tutorials and guides
  • Translate documentation to other languages

๐Ÿ“„ License & Legal

This framework is provided as-is for educational and development purposes. Feel free to modify, extend, and use it for your projects.

Important Notes:

  • Ensure compliance with your local LLM model licenses
  • Respect rate limits and usage policies of Ollama
  • Consider privacy and security when processing sensitive documents
  • No warranty provided - use at your own risk

๐Ÿ™ Acknowledgments & Credits

๐Ÿ† Core Technologies

  • Ollama - Amazing local LLM runtime that makes this all possible
  • Rich - Beautiful terminal output and formatting
  • Click - Elegant command-line interface framework
  • Pydantic - Data validation and settings management

๐ŸŒŸ Inspiration & Community

  • Open Source LLM Community - For making local AI accessible to everyone
  • Python Ecosystem - For providing excellent tools and libraries
  • AI Research Community - For advancing the field of artificial intelligence

๐Ÿš€ Ready to Get Started?

# Quick setup (takes 30 seconds)
git clone <repository> && cd local_llm
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Verify everything works
python3 main.py health

# Start your AI journey!
python3 main.py chat

Happy coding with your local LLM! ๐Ÿค–โœจ

Transform your local AI into a powerful, versatile assistant today! ๐Ÿš€