Alex Assistant

A simple tutorial showing how to capture the screen, recognize speech, add context, and query Anthropic API

Disclaimer: This assistant is not meant to be used for real proposal development, but is simply a fun demo that combines many inputs (voice, image, text) and shows how to weave these into an application with Anthropic Claude.

Features

Captures the current screen as an input to the LLM.
Uses speech recognition to convert spoken commands to text.
Shows a simple strategy to inject context from a text file into the prompt
Queries the Anthropic API with both text and image inputs.
Provides responses in Markdown format.

Installation

Clone the repository:

git clone https://github.com/blaiszik/alex-assistant
cd alex-assistant

Create a virtual environment (optional but recommended):

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt
Install the package:

pip install .

Setting the Environment Variable

To use the Anthropic API, you need to set the ANTHROPIC_API_KEY environment variable. Here’s how to do it:

On Mac

Open your terminal.
Add the following line to your shell profile file (e.g., ~/.bash_profile, ~/.zshrc, or ~/.profile):

export ANTHROPIC_API_KEY='your-api-key-here'

Apply the changes. e.g., source ~/.bash_profile source ~/.zshrc

Usage

Ensure the ANTHROPIC_API_KEY environment variable is set.
Run the main script:

python alex.py

Speak your command when prompted. The tool will capture the screen and query the Anthropic API with the captured image and spoken command.

The default model is set to Claude 3 Haiku (the cheapest available model). For better responses, you can try the other models.

Other models "claude-3-opus-20240229", "claude-3-sonnet-20240229", "claude-3-haiku-20240307" Newer models may be available, see: https://docs.anthropic.com/en/docs/models-overview