Computer Use Library

A Go library for browser-based computer use automation, designed for LLM agents (Claude Computer Use, Google Gemini, etc.). Built on go-rod for robust browser control.

Features

Unified API: Single set of commands that work for both Claude and Gemini with minimal adaptation
Flexible Coordinate System: Choose between normalized (for Gemini, 0-999 grid) or pixel-based coordinates
Idiomatic Go: Proper error handling and clean interface design
Comprehensive Actions: Supports clicking, typing, scrolling, dragging, keyboard shortcuts, and more
Screenshot Capability: Capture browser state for visual feedback to LLMs
Session Management: Easy browser lifecycle management with context support

Installation

go get github.com/PeronGH/computer-use-lib

Quick Start

package main

import (
    "context"

    computeruse "github.com/PeronGH/computer-use-lib"
)

func main() {
    // Create a new browser session
    session, err := computeruse.NewSession(context.Background(), computeruse.SessionConfig{
        ScreenWidth:          1440,
        ScreenHeight:         900,
        NormalizeCoordinates: true, // Use 0-999 grid
        InitialURL:           "https://www.google.com",
    })
    if err != nil {
        panic(err)
    }
    defer session.Close()

    // Use the session
    session.Navigate("https://example.com")
    session.ClickAt(500, 500)
    session.TypeText("Hello, World!")
    screenshot, _ := session.Screenshot()
    _ = screenshot
}

API Reference

Session Configuration

type SessionConfig struct {
    ScreenWidth          int    // Browser viewport width
    ScreenHeight         int    // Browser viewport height
    NormalizeCoordinates bool   // If true, use 0-999 grid; if false, use pixels
    InitialURL           string // Starting URL (default: "https://www.google.com")
    SearchEngineURL      string // URL for Search() action (default: "https://www.google.com")
    Headless             bool   // Run browser in headless mode
}

Available Commands

All methods return error for proper error handling.

Method	Signature	Claude Mapping	Gemini Mapping
`Screenshot`	`Screenshot() ([]byte, error)`	`screenshot`	N/A (call separately)
`ClickAt`	`ClickAt(x, y int) error`	`left_click`	`click_at`
`RightClickAt`	`RightClickAt(x, y int) error`	`right_click`	N/A
`MiddleClickAt`	`MiddleClickAt(x, y int) error`	`middle_click`	N/A
`DoubleClickAt`	`DoubleClickAt(x, y int) error`	`double_click`	N/A
`TripleClickAt`	`TripleClickAt(x, y int) error`	`triple_click`	N/A
`MouseDown`	`MouseDown(x, y int) error`	`left_mouse_down`	N/A
`MouseUp`	`MouseUp(x, y int) error`	`left_mouse_up`	N/A
`MouseMove`	`MouseMove(x, y int) error`	`mouse_move`	N/A
`HoverAt`	`HoverAt(x, y int) error`	`mouse_move`	`hover_at`
`ClickDrag`	`ClickDrag(fromX, fromY, toX, toY int) error`	`left_click_drag`	`drag_and_drop`
`TypeText`	`TypeText(text string) error`	`type`	N/A
`TypeTextAt`	`TypeTextAt(x, y int, text string, clearBefore, pressEnter bool) error`	`left_click` + `type` + `key`	`type_text_at`
`Key`	`Key(keys ...string) error`	`key`	`key_combination`
`Scroll`	`Scroll(direction string, amount int) error`	`scroll`	`scroll_document`
`ScrollAt`	`ScrollAt(x, y int, direction string, magnitude int) error`	`mouse_move` + `scroll`	`scroll_at`
`Navigate`	`Navigate(url string) error`	N/A	`navigate`
`GoBack`	`GoBack() error`	`key` ("Alt+Left")	`go_back`
`GoForward`	`GoForward() error`	`key` ("Alt+Right")	`go_forward`
`Search`	`Search() error`	N/A	`search`
`GetURL`	`GetURL() (string, error)`	N/A	N/A
`Close`	`Close() error`	N/A	N/A

Architecture

The library provides a unified API layer that translates high-level actions into go-rod browser commands:

LLM Agent (Claude/Gemini)
         ↓
Computer Use Library API
         ↓
go-rod (Browser Control)
         ↓
Chrome/Chromium Browser