/cerebellum

Browser automation system that uses AI-driven planning to navigate web pages and perform goals.

Primary LanguagePythonMIT LicenseMIT

Cerebellum

A lightweight browser using agent that accomplishes user-defined goals on webpages using keyboard and mouse actions.

See It In Action

Goal: Find a USB C to C cable that is 10 feet long and add it to cart

amazon.mp4

Setup

Please see setup directions for your language:

How It Works

  1. Web browsing is simplified to navigating a directed graph.
  2. Each webpage is a node with visible elements and data.
  3. User actions, such as clicking or typing, are edges that move between nodes.
  4. Cerebellum starts at a webpage and aims to reach a target node that embodies the completed goal.
  5. It uses a LLM to finds new nodes by analyzing page content and interactive elements.
  6. The LLM decides the next action based on the current state and past actions.
  7. Cerebellum executes the LLM's planned action and feeds the new state back into the LLM for next step.
  8. The process ends when the LLM decides the goal has been reached or is unachieveable.

Currently, Claude 3.5 Sonnet is the only supported LLM

Features

  • Compatible with any Selenium-supported browser.
  • Fills forms using user-provided JSON data.
  • Accepts runtime instructions to dynamically adjust browsing strategies and actions.
  • TODO: Create training datasets from browsing sessions

Roadmap

  • Integrate Claude 3.5 Sonnet as a ActionPlanner
  • Demonstrate successful BrowserAgent across a variety of tasks
  • Create Python SDK
  • Handle tabbed browsing
  • Handle data extraction from website
  • Improve vertical scrolling behavior
  • Improve horizontal scrolling behavior
  • Improve system prompt performance
  • Improve mouse position marking on screenshots
  • Add ability for converting browser sessions into training datasets
  • Support for additional LLMs as an ActionPlanner
  • Train a local model
  • Integrate local model as a ActionPlanner

Known Issues

  • Claude 3.5 safety refusals
    • Refuses to solve CAPTCHAs
    • Refuses to navigate when political content is on the page

Contributing

Contributions to Cerebellum are welcome. For details on how to get involved, please refer to our CONTRIBUTING.md.

We appreciate all contributions, whether they're bug reports, feature requests, or code changes.

License

This project is licensed under the MIT License.

Maintainer

Collaborators