Chrome-GPT is an AutoGPT experiment that utilizes Langchain and Selenium to enable an AutoGPT agent take control of an entire Chrome session. With the ability to interactively scroll, click, and input text on web pages, the AutoGPT agent can navigate and manipulate web content.
Input Prompt: Find me a bar that can host a 20 person event near Chelsea, Manhattan evening of Apr 30th. Fill out contact us form if they have one with info: Name Richard, email he@hrichard.com.
DEMO.mov
Demo made by Richard He
- 🌎 Google search
- 🧠 Long-term and short-term memory management
- 🔨 Chrome actions: describe a webpage, scroll to element, click on buttons/links, input forms, switch tabs
- 🤖 Supports multiple agent types: Zero-shot, BabyAGI and Auto-GPT
- 🔥 (IN PROGRESS) Chrome plugin support
- There are limited web crawling features, with buttons and input fields sometimes failing to appear in prompt.
- The response time is slow, with each action taking between 1-10 seconds to run.
- At times, langchain agents are unable to parse GPT outputs (refer to langchain discussion: langchain-ai/langchain#4065).
- Chrome
- Python >3.8
- Install Poetry
- Set up your OpenAI API Keys and add
OPENAI_API_KEY
env variable - Install Python requirements via poetry
poetry install
- Open a poetry shell
poetry shell
- Run chromegpt via
python -m chromegpt
- GPT-3.5 Usage (Default):
python -m chromegpt -v -t "{your request}"
- GPT-4 Usage (Recommended, needs GPT-4 access):
python -m chromegpt -v -a auto-gpt -m gpt-4 -t "{your request}"
- For help:
python -m chromegpt --help
Usage: python -m chromegpt [OPTIONS]
Run ChromeGPT: An AutoGPT agent that interacts with Chrome
Options:
-t, --task TEXT The task to execute [required]
-a, --agent [auto-gpt|baby-agi|zero-shot]
The agent type to use
-m, --model TEXT The model to use
--headless Run in headless mode
-v, --verbose Run in verbose mode
--human-in-loop Run in human-in-loop mode, only available
when using auto-gpt agent
--help Show this message and exit.