/ProctorAI

The AI to keep you focused 😈

Primary LanguagePython

ProctorAI👁️

🔍 Overview

ProctorAI is a multimodal AI that watches your screen and calls you out if it sees you procrastinating. Proctor works by taking screenshots of your computer every few seconds (at a specified interval) and feeding them into a multimodal model, such as Claude-3.5-Sonnet, GPT-4o, or LLaVA-1.5. If ProctorAI determines that you are not focused, it will take control of your screen and yell at you with a personalized message. After making you pledge to stop procrastinating, ProctorAI will then give you 15 seconds to close the source of procrastination or will continue to bug you.

Project demo

An intelligent system that knows what does and doesn't count as procrastination. Compared to traditional site blockers, ProctorAI is intelligent and capable of understanding nuanced workflows. This makes a big difference. Before every Proctor session, the user types out their session specification, where they explicitly tell Proctor what they're planning to work on, what behaviors are allowed during the session, and what behaviors are not allowed. Thus, Proctor can handle nuanced rules such as "I'm allowed to go on YouTube, but only to watch Karpathy's lecture on Makemore". No other productivity software can handle this level of flexibility.

Description of the image

ProctorAI aims to be this woman, but available all the time, snarkier, and with full context of your work.

It's alive! A big design goal with Proctor is that it should feel alive. In my experience, I tend not to break the rules because I can intuitively feel the AI watching me--just like how test-takers are much less likely to cheat when they can feel the proctor of an exam watching them.

🚀 Setup and Installation

To start the GUI, just type ./run.sh. You might get some popups asking to allow terminal access to certain utilities, which you should enable. The current implementation requires MacOS (and you can find a Windows-compatible version in the windows branch). We hope to make Proctor platform-independent soon.

git clone https://github.com/jam3scampbell/ProctorAI
cd ProctorAI
python -m venv focusenv
source focusenv/bin/activate
pip install -r requirements.txt
./run.sh

Depending on which models you want to use under-the-hood, you should define the following API keys as environment variables:

  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • GEMINI_API_KEY
  • ELEVEN_LABS_API_KEY

To keep the running API price low, I recommend using two_tier mode, with a local model such as LLaVA as the router model. For this, you will need Ollama and to install the llava model. Make sure Ollama is running in the background before starting ProctorAI.

⚙️ Options/Settings

The following can all be toggled in the settings page or used as arguments to main.py:

model_name API name of the main model
tts Enable Eleven Labs text-to-speech
voice Select the voice of Eleven Labs speaker
cli_mode Run without GUI
delay_time The amount of time between each screenshot
initial_delay The amount of time to wait before Proctor starts watching your screen (useful for giving you time to open what you want to work on)
countdown_time The amount of time Proctor gives to close the source of procrastination
user_name Enter your name to make the experience more personalized
print_CoT Print the model's chain-of-thought to the console
two_tier If activated, first sends image to router_model and only sends up to the main model if the router_model thinks the user is procrastinating. Useful for bringing down API costs. The router model is given a stricter prompt so that it leans towards flagging behavior it thinks is suspicious.
router_model API name of the model to use as the router

🎯 Understanding This Repository

Right now, basically all functionality is contained in the following files:

  • main.py: contains the main control loop that takes screenshots, calls the model, and initiates procrastination events
  • user_interface.py: runs the GUI written in PyQT5
  • api_models.py: houses a unified interface for calling different model families
  • procrastination_event.py: contains methods for displaying the popup when the user is caught procrastinating as well as the timer telling the user to leave what they were doing
  • utils.py: functions for taking screenshots, tts, etc
  • config_prompts.yaml: all prompts used in the LLM scaffolded system

As the program runs, it'll create a settings.json file and a screenshots folder in the root directory. If TTS is enabled, it'll also write yell_voice.mp3 to the src folder.

🌐 Roadmap and Future Improvements

This project is still very much under active development. Some features I'm hoping to add next:

  • finetuning a LLaVA model specifically for the task/distribution
  • scheduling sessions, have it start running when you open your computer
  • make it extremely annoying to quit the program (at least until the user finishes their pre-defined session)
  • logging, time-tracking, & summary statistics
  • improve chat feature and give model greater awareness of state/context
  • having a drafts folder for prompts so you don't have to re-type it out if you're doing the same task as you were the other day
  • mute all other sounds on computer when the TTS plays (so it isn't drowned out by music)