/computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Primary LanguagePythonMIT LicenseMIT

If you like our project, please give us a star ⭐ on GitHub for the latest update.

arXiv Project Page Hits

Star Overview

Computer Use OOTBStar is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (Claude 3.5 Computer Use) and locally-running models (ShowUI).

No Docker is required, and it supports both Windows and macOS. This project provides a user-friendly interface based on Gradio. 🎨

For more information, you can visit our study on Claude 3.5 Computer Use [project page]. 🌐

Update

  • Major Update! [2024/11/27] Local RunπŸ”₯ is now live! Say hello to ShowUI, a open-source 2B vision-language-action (VLA) model for GUI Agent. Now compatible with "gpt-4o + ShowUI" (~200x cheaper)* & "Qwen2-VL + ShowUI" (~30x cheaper)* for only few cents for each taskπŸ’°! *compared to Claude Computer Use.
  • [2024/11/20] We've added some examples to help you get hands-on experience with Claude 3.5 Computer Use.
  • [2024/11/19] Forget about the single-display limit set by Anthropic - you can now use multiple displays πŸŽ‰!
  • [2024/11/18] We've released a deep analysis of Claude 3.5 Computer Use: https://arxiv.org/abs/2411.10323.
  • [2024/11/11] Forget about the low-resolution display limit set by Anthropic β€” you can now use any resolution you like and still keep the screenshot token cost low πŸŽ‰!
  • [2024/11/11] Now both Windows and macOS platforms are supported πŸŽ‰!
  • [2024/10/25] Now you can Remotely Control your computer πŸ’» through your mobile device πŸ“± β€” No Mobile App Installation required! Give it a try and have fun πŸŽ‰.

Demo Video

computer_use_with_showui-en-s.mp4
Watch the video Watch the video

πŸš€ Getting Started

0. Prerequisites

  • Instal Miniconda on your system through this link. (Python Version: >= 3.11).
  • Hardware Requirements:
    • Windows: Must include CUDA, with a GPU memory greater than 6GB.
    • Mac: Processor must be M1 or higher, with a memory of at least 16GB.

1. Clone the Repository πŸ“‚

Open the Conda Terminal. (After installation Of Miniconda, it will appear in the Start menu.) Run the following command on Conda Terminal.

git clone https://github.com/showlab/computer_use_ootb.git
cd computer_use_ootb

2.1 Install Dependencies πŸ”§

pip install -r dev-requirements.txt

2.2 (Optional) Get Prepared for ShowUI Local-Run

  1. Download all files of the ShowUI-2B model via the following command. Ensure the ShowUI-2B folder is under the computer_use_ootb folder.

    python install_showui.py
  2. Make sure to install the correct GPU version of PyTorch (CUDA, MPS, etc.) on your machine. See install guide and verification.

  3. Get API Keys for GPT-4o or Qwen-VL. For mainland China users, Qwen API free trial for first 1 mil tokens is available.

3. Start the Interface ▢️

Start the OOTB interface:

python app.py

If you successfully start the interface, you will see two URLs in the terminal:

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://xxxxxxxxxxxxxxxx.gradio.live (Do not share this link with others, or they will be able to control your computer.)

For convenience, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don’t need to manually pass the keys each run. On Windows Powershell (via the set command if on cmd):

$env:ANTHROPIC_API_KEY="sk-xxxxx" (Replace with your own key)
$env:QWEN_API_KEY="sk-xxxxx"
$env:OPENAI_API_KEY="sk-xxxxx"

On macOS/Linux, replace $env:ANTHROPIC_API_KEY with export ANTHROPIC_API_KEY in the above command.

4. Control Your Computer with Any Device can Access the Internet

  • Computer to be controlled: The one installed software.
  • Device Send Command: The one opens the website.

Open the website at http://localhost:7860/ (if you're controlling the computer itself) or https://xxxxxxxxxxxxxxxxx.gradio.live in your mobile browser for remote control.

Enter the Anthropic API key (you can obtain it through this website), then give commands to let the AI perform your tasks.

Desktop Interface

πŸ–₯️ Supported Systems

  • Windows (Claude βœ…, ShowUI βœ…)
  • macOS (Claude βœ…, ShowUI βœ…)

⚠️ Risks

  • Potential Dangerous Operations by the Model: The models' performance is still limited and may generate unintended or potentially harmful outputs. Recommend continuously monitoring the AI's actions.
  • Cost Control: Each task may cost a few dollars for Claude 3.5 Computer Use.πŸ’Έ

πŸ“… Roadmap

  • Explore available features
    • The Claude API seems to be unstable when solving tasks. We are investigating the reasons: resolutions, types of actions required, os platforms, or planning mechanisms. Welcome any thoughts or comments on it.
  • Interface Design
    • Support for Gradio ✨
    • Simpler Installation
    • More Features... πŸš€
  • Platform
    • Windows
    • Mobile (Send command)
    • macOS
    • Mobile (Be controlled)
  • Support for More MLLMs
    • Claude 3.5 Sonnet 🎡
    • GPT-4o
    • Qwen2-VL
    • ...
  • Improved Prompting Strategy
    • Optimize prompts for cost-efficiency. πŸ’‘

Join Discussion

Welcome to discuss with us and continuously improve the user experience of Computer Use - OOTB. Reach us using this Discord Channel or the WeChat QR code below!

gradio_interface gradio_interface

Logo