computer-use

There are 107 repositories under computer-use topic.

  • bytedance/UI-TARS-desktop

    The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

    Language:TypeScript19.4k1664801.8k
  • trycua/cua

    Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).

    Language:Python11.2k50152643
  • web-infra-dev/midscene

    Your AI Operator for Web, Android, Automation & Testing.

    Language:TypeScript10.6k65439726
  • bytebot-ai/bytebot

    Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.

    Language:TypeScript9.5k64831.2k
  • simular-ai/Agent-S

    Agent S: an open agentic framework that uses computers like a human

    Language:Python8.1k7082872
  • Upsonic

    Upsonic/Upsonic

    Agent Framework For Fintech

    Language:Python7.7k60164714
  • A9T9/RPA

    Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.

    Language:JavaScript1.8k67101366
  • e2b-dev/open-computer-use

    AI computer use powered by open source LLMs and E2B Desktop Sandbox

    Language:Python1.6k1916221
  • showlab/ShowUI

    [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

    Language:Python1.5k1885109
  • trycua/acu

    A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

  • OpenAdapt

    OpenAdaptAI/OpenAdapt

    Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

    Language:Python1.4k15508203
  • zai-org/CogAgent

    An open-sourced end-to-end VLM-based GUI Agent

    Language:Python1.1k204787
  • deedy/mac_computer_use

    A fork of Anthropic Computer Use that you can run on Mac computers to give Claude and other AI models autonomous access to your computer.

    Language:Python8251311136
  • microsoft/WindowsAgentArena

    Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

    Language:Python78384083
  • instavm/clickclickclick

    A framework to enable autonomous android and computer use using any LLM (local or remote)

    Language:Python5127862
  • suitedaces/computer-agent

    Desktop app powered by Claude’s computer use capability to control your computer

    Language:Python50610960
  • baryhuang/mcp-remote-macos-use

    The only general AI agent that does NOT requires extra API key, giving you full control on your local and remote MacOs from Claude Desktop App

    Language:Python4133649
  • OS-Agent-Survey/OS-Agent-Survey

    This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).

  • BrowserOperator/browser-operator-core

    Browser Operator - The AI browser with built in Multi-Agent platform! Open source alternative to ChatGPT Atlas, Perplexity Comet, Dia and Microsoft CoPilot Edge Browser

    Language:TypeScript31861149
  • cyberdesk-hq/cyberdesk

    Open source virtual desktops for AI agents

    Language:JavaScript2902436
  • open-computer-use

    LLmHub-dev/open-computer-use

    The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.

    Language:TypeScript201
  • cuga-project/cuga-agent

    CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.

    Language:Python18218
  • spongecake

    aditya-nadkarni/spongecake

    Spongecake is the easiest way to launch computer use agents.

    Language:JavaScript1592220
  • BIGPPWONG/EdgeBox

    A fully-featured, GUI-powered local LLM Agent sandbox with complete MCP protocol support. Features both CLI and full desktop environment, enabling AI agents to operate browsers, terminal, and other desktop applications just like humans. Based on E2B oss code.

    Language:TypeScript15717
  • bilalonur/awesome-llm-os

    A curated list of awesome resources, tools, research papers, and projects related to the concept of Large Language Model Operating Systems (LLM-OS).

  • 777genius/os-ai-computer-use

    AI controls your OS. OS AI Computer Use, OS and API agnostic. For now on Anthropic (Claude) API. Desktop app ready.

    Language:Python133
  • chatsci/Aeiva

    A general AI agent framework that can be adapted to various tasks and environments.

    Language:Python102338
  • jeffrey-zang/opus

    On-device computer use agent that runs fully in the background 🪄

    Language:TypeScript932016
  • open-compass/MMBench-GUI

    Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

    Language:Python84173
  • openmule/gacua

    The World's First Out-of-the-Box Computer Use Agent Powered by Gemini-CLI @openmule

    Language:TypeScript84128
  • TurixAI/TuriX-CUA

    This is the official website for TuriX Computer-use-Agent

    Language:Python746
  • AB498/computer-control-mcp

    MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

    Language:Python681311
  • TongUI-agent

    TongUI-agent/TongUI-agent

    Release of code, datasets and model for our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

    Language:HTML563103
  • presidio-oss/factif-ai

    AI-powered computer control for automated testing. Factifai uses vision models (Claude, GPT-4o, Gemini) to interact with applications naturally - clicking, typing, and verifying results just like a human would.

    Language:TypeScript491325
  • lvqq/intelli-browser

    ✨ Use natural language to control your browser, powered by LLM and playwright

    Language:TypeScript48103
  • reidbarber/webmarker

    Mark web pages for use with vision-language models

    Language:TypeScript46253