This project demonstrates how to automate web interactions using Selenium and OpenAI's GPT-4. The script initializes a browser, navigates to a specified website, captures screenshots, and uses GPT-4 for making decisions on what actions to take next based on the screenshot and the context of accessible elements on the page.
- Navigate and interact with web pages using Selenium.
- Capture page screenshots and base64 encoding.
- Use OpenAI GPT-4 to analyze screenshots and determine the next action.
- Handle interactions with clickable links, input fields, and dropdowns.
- Maintain state and log detailed notes on actions performed.
- Python 3.7+
- Selenium WebDriver
- OpenAI API Key
-
Clone the Repository:
git clone https://github.com/your-username/web-automation-gpt4-selenium.git cd web-automation-gpt4-selenium
-
Install the Required Libraries:
pip install selenium requests
-
Download the WebDriver:
- Download the appropriate WebDriver for your browser (e.g., ChromeDriver for Chrome).
- Ensure the WebDriver executable is in your system's PATH.
-
Set Your OpenAI API Key: Replace
"YOUR_OPENAI_API_KEY"
with your actual API key from OpenAI. -
Update the Starting URL: Modify the
driver.get("https://en.wikipedia.org/wiki/Main_Page")
line in the script to the desired starting URL. -
Run the Script:
python script.py
The script will:
- Initialize the Selenium WebDriver.
- Navigate to the starting URL.
- Capture a screenshot of the page.
- Send the screenshot to GPT-4 along with the context of interactable elements on the page.
- Perform the actions recommended by GPT-4.
- Log notes for each action and any errors encountered.
encode_image
: Encodes an image to base64 format.ask_gpt_with_image
: Sends a screenshot to OpenAI GPT-4 and retrieves the response.get_interactable_elements
: Gathers all input fields and dropdowns on the page, returning a dictionary of elements.- Interaction Loop: Captures screenshots, sends them to GPT-4 for analysis, performs the recommended actions, and logs notes.
- The script limits the number of interactions to 10 for demonstration purposes. Modify the loop limit as needed.
- Ensure that the WebDriver version matches your browser version.