This project leverages the power of Google's Gemini AI model to enable natural language-driven automation of a Windows 10 environment. It interprets user prompts, analyzes screen captures, and translates requests into executable system commands, mouse actions, and keyboard inputs.
- Natural Language Interface: Interact with your Windows desktop using simple, conversational commands.
- Visual Context: The application analyzes screenshots to understand the current state of your desktop, ensuring accurate command execution.
- Automated Actions: Performs mouse clicks, keyboard inputs, application launches, and more.
- Error Handling & Retries: Implements robust error handling and retry mechanisms for better reliability.
- Optimized for Python 3.13.1: Written with performance and efficiency in mind, leveraging the latest features of Python 3.13.1 where applicable.
-
Python 3.13.1: This project is optimized for Python 3.13.1. Ensure you have it installed.
- Installation: Download from the official Python website or use your system's package manager.
-
Pip (Python Package Installer): Should come bundled with Python 3.13.1
-
Git: For cloning the repository (if applicable).
-
Google Gemini API Key: Obtain an API key from the Google AI Studio. You will need to store it in your
.env
file or in thecredentials.env
file asGEMINI_API_KEY=your_api_key
.
-
Clone the Repository (Optional):
git clone https://github.com/your-username/your-repository.git cd your-repository
-
Create a Virtual Environment (Recommended):
python -m venv venv source venv/bin/activate # On Linux/macOS venv\Scripts\activate # On Windows
-
Install Required Packages
pip install -r requirements.txt
If the
requirements.txt
file does not exist then:pip install google-generativeai python-dotenv mss pyautogui tkinter
-
Set Up Environment Variables:
-
Create a
.env
file or acredentials.env
file in the root of your project directory and add your Gemini API key:GEMINI_API_KEY=YOUR_GEMINI_API_KEY
- If a
.env
file exists, then theGEMINI_API_KEY
must be in that file. - If the
.env
file does not exist and acredentials.env
file exists, then theGEMINI_API_KEY
must be in that file.
- If a
-
- Run the Script:
(Replace
python your_script_name.py
your_script_name.py
with the actual name of your main script). - Follow the Prompts: A dialog box will appear asking how you want to interact with your computer. Enter your commands.
capture_screenshot()
: Captures a screenshot of the primary monitor usingmss
library and encodes it in base64.analyze_image_and_generate_command()
: Sends a user's prompt along with the screen capture to the Google Gemini model and receives commands.execute_command_response()
: Parses the response from the Google Gemini model and executes the commands.- The main execution loop in
if __name__ == "__main__":
: The main loop that captures screenshots, receives user prompts, sends them to the Gemini model, and executes the commands.
This project is designed to work efficiently within the Python 3.13.1 ecosystem. Here are some key areas and best practices:
- Library Optimization: This project utilizes optimized libraries like
mss
for fast screen captures andpyautogui
for efficient UI interactions. - Minimal Overhead: The codebase minimizes unnecessary operations, reducing overhead.
- Fast JSON Handling: We are leveraging
json
which is optimized for faster parsing for efficient data transfer with the Google Gemini API. - Asynchronous Operations: Async operations should be considered for any function that could potentially block and lead to slower performance.
- String Handling: Instead of concatenating strings, consider using f-strings or the
.format()
function for more efficient string construction. - Profiling: Profile sections of the code to determine performance bottlenecks and implement changes as needed to improve the performance.
Contributions are welcome! If you would like to improve the project, please follow these steps:
- Fork the repository.
- Create a new branch for your changes.
- Implement your changes.
- Open a pull request with a clear description of your updates.
[Specify your project's license here. For example, MIT License.]
Note: Replace your-username/your-repository.git
with the actual URL of your git repository and your_script_name.py
with the name of your main script.