GUI automation

This project includes a set of scripts for automating GUI interactions and logging keystrokes. The main components are:

automate_gui.ipynb: A Jupyter notebook that uses OpenAI to generate keystrokes which can be replayed into an application to accomplish a task. It also includes functionality for encoding images to base64 format.
keylogger.py: A Python script that logs keyboard and mouse events. It uses the pynput library to capture these events and writes them to a file.
measure.html: An HTML file that provides a GUI for measuring distances and areas in an image. It uses JavaScript and HTML5 Canvas for the GUI and calculations.

Installation

Clone the repository to your local machine.
Install the required Python packages using pip:

pip install -r requirements.txt

Usage

`automate_gui.ipynb`

Open the notebook in Jupyter.
Run the cells in order. The final cell will output the generated keystrokes.

`keylogger.py`

Run the script in a Python environment.
Perform keyboard and mouse actions that you want to log.
Press the 'Esc' key to stop logging.
The logged events will be written to a file named 'events-.txt'.
Replay from the file with python keylogger.py --replay [file name]

Note that replaying arbitrary events generated by AI on the internet on your real computer is low-to-moderately risk, depending on what task you're automating. You assume all responsibility for this software's usage — in particular I could imagine it accidentally clicking delete on a file or closing a window and you losing unsaved work, or worse. Use at your own risk!

`measure.html`

Open the HTML file in a web browser.
Use the 'Choose File' button to load an image.
Enter a real-world distance for scale, then click 'Start Scale' and mark two points on the image.
Click 'Set Scale', then use 'Calculate Distances' or 'Calculate Area' as needed.
Intended for measuring roofing, it includes a 'predominant pitch' field.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the terms of the MIT license.

maxtheman/gpt4v_gui_automation