An AI-powered Android automation agent that can understand screen content and perform actions autonomously using OpenAI's GPT models and Android's Accessibility Services.
- AI-Powered Screen Analysis: Uses OpenAI GPT models to understand Android UI elements and screen context
- Autonomous Actions: Performs taps, text input, swipes, and navigation based on natural language instructions
- Voice Input Support: Convert speech to text for hands-free agent commands
- Real-time Screen Parsing: Analyzes UI hierarchy and identifies interactable elements
- Floating UI Controls: Overlay interface for quick agent access without leaving current app
- Loop Detection: Intelligent detection and recovery from repetitive action patterns
- Context-Aware Decision Making: Maintains task context and progress tracking
- Metacognitive Reflection: Self-assessment of progress and decision to continue or stop
- Error Recovery: Robust handling of UI state changes and unexpected scenarios
- Performance Monitoring: Built-in analytics and performance tracking
- Comprehensive Sentry Integration: Real-time error tracking and performance monitoring
- API Error Tracking: Detailed OpenAI API call monitoring with rate limit detection
- Agent Performance Analytics: Step-by-step execution tracking and success metrics
- Debug Interface: Built-in debugging tools for development and troubleshooting
- Android Studio (Arctic Fox or later)
- Android Device/Emulator (API level 24+)
- OpenAI API Key with GPT-4 access
- Sentry Account (optional, for error tracking)
git clone https://github.com/reagent-systems/Simple-Agent-Android.git
cd SimpleAgentAndroid2./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apk- Open the app
- Tap "Enable Accessibility Service"
- Navigate to Settings → Accessibility → Simple Agent Android
- Toggle the service ON
- Enter a natural language instruction (e.g., "Open Gmail and check my inbox")
- Optionally use voice input by tapping the microphone
- Tap "Start Agent" to begin autonomous execution
- Monitor progress in real-time through the output display
- Enable "Floating UI" for quick access while using other apps
- The floating button appears as an overlay on all screens
- Tap to quickly start/stop the agent without switching apps
- Main execution engine that coordinates all agent operations
- Manages conversation history and OpenAI API interactions
- Implements step-by-step task execution with error handling
- Integrates loop detection and metacognitive decision making
- Handles all OpenAI API communications
- Implements tool calling for Android actions
- Manages request/response formatting and error handling
- Supports function calling for structured agent actions
- Parses Android UI hierarchy using Accessibility Services
- Identifies and prioritizes interactable elements
- Extracts text content, buttons, and input fields
- Provides structured screen context to the AI
- Executes physical interactions with Android UI
- Supports tap, text input, swipe, and navigation gestures
- Implements element waiting and screen state verification
- Handles accessibility service integration
- Task Planning: Breaks down complex instructions into steps
- Progress Reflection: Analyzes each action's effectiveness
- Stopping Decisions: Determines when objectives are completed
- Error Analysis: Identifies and responds to failure patterns
- Exact Repetition: Detects identical repeated actions
- Semantic Loops: Identifies functionally similar action patterns
- Progress Stagnation: Monitors for lack of meaningful progress
- Recovery Strategies: Implements intelligent loop breaking
- SentryManager: Core error tracking and performance monitoring
- ApiErrorTracker: Specialized OpenAI API error analysis
- AgentErrorTracker: Agent-specific performance and error tracking
- SentryExtensions: Helper functions for easy integration
// Maximum steps before automatic termination
private const val MAX_STEPS = 10
// Loop detection sensitivity
private const val MAX_SIMILAR_ACTIONS = 2
private const val MAX_NO_PROGRESS_STEPS = 3// Model configuration
private const val MODEL = "gpt-4-turbo-preview"
private const val API_URL = "https://api.openai.com/v1/chat/completions"// Slow operation detection (milliseconds)
private const val SLOW_OPERATION_THRESHOLD = 5000L
// API response time monitoring
private const val SLOW_API_THRESHOLD = 10000LThe app requires the following permissions:
SYSTEM_ALERT_WINDOW: For floating UI overlayRECORD_AUDIO: For voice input functionalityINTERNET: For OpenAI API communicationACCESS_NETWORK_STATE: For network status monitoring
- Accessibility Service: Required for screen reading and UI interaction
- Usage: Reads screen content and performs automated actions
- Privacy: All screen data is processed locally and only high-level context is sent to OpenAI
- Debug Screen: Access via navigation drawer
- Real-time Logs: View agent execution steps
- Screen Analysis: Inspect parsed UI elements
- API Monitoring: Track OpenAI API calls and responses
- Solution: Enable the service in Android Settings → Accessibility
- Check: API key validity and account credits
- Monitor: Rate limiting and quota usage in Sentry
- Automatic: Loop detection will trigger recovery
- Manual: Stop and restart with refined instructions
- Error Tracking: Real-time error monitoring and alerting
- Performance: API response times and agent execution metrics
- Usage Patterns: Most common tasks and success rates
- Failure Analysis: Detailed error context and stack traces
- Task Success Rate: Percentage of successfully completed instructions
- Average Execution Time: Time from start to task completion
- API Call Efficiency: Requests per successful task completion
- Loop Detection Rate: Frequency of loop detection and recovery
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Kotlin coding conventions
- Use meaningful variable and function names
- Add comprehensive comments for complex logic
- Include error handling for all external calls
- Test on multiple Android versions and devices
- Verify accessibility service functionality
- Test with various OpenAI API scenarios
- Validate error handling and recovery mechanisms
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the GPT models that power the AI decision making
- Sentry for comprehensive error tracking and performance monitoring
- Android Accessibility Services for enabling UI automation capabilities
- Kotlin Coroutines for handling asynchronous operations efficiently
- Wiki: Detailed guides and tutorials
- API Reference: Complete function and class documentation
- Examples: Sample use cases and implementation patterns
- Issues: Report bugs and request features on GitHub
- Discussions: Join community discussions and share experiences
- Discord: Real-time chat with other developers and users