Visual Question Answering (VQA) using Claude

Overview

VQA_Claude is an extension of the Claude AI Toolkit (see: https://github.com/RMNCLDYO/claude-ai-toolkit), offering advanced Visual Question Answering capabilities with multi-page PDF analysis using Claude 3.5 Sonnet at its backend, via a simple Streamlit app interface. The Claude AI toolkit itself is designed to be user-friendly and highly adaptable, making it suitable for both beginners and advanced users. With the integration of the Streamlit app, users can easily interact with the Anthropic's models for text generation and vision analysis.

Key Features

Streamlit App: User-friendly interface for interacting with the toolkit.
Conversational AI: Create interactive, real-time chat experiences (chatbots) or AI assistants.
Image Captioning: Generate detailed descriptions and insights or create captions from images.
Text Generation: Produce coherent and contextually relevant text and answers from simple prompts.
PDF Analysis: Analyze multi-page PDFs using Visual Question Answering techniques.
Highly Customizable: Tailor settings like streaming output, system prompts, sampling temperature and more to suit your specific requirements.
Lightweight Integration: Efficiently designed with minimal dependencies, requiring only the requests package for core functionality.

Prerequisites

Python 3.x
An API key from Anthropic

Dependencies

The following Python packages are required:

requests: For making HTTP requests to the Claude API.
python-dotenv: For managing API keys and other environment variables.
streamlit: For the web interface.
PyMuPDF (fitz): For PDF processing.

Installation

To use VQA_Claude, clone the repository to your local machine and install the required Python packages.

Here's the formatted content to match a README file for a Git repository:

VQA_Claude

Clone the Repository

git clone https://github.com/menonpg/VQA_Claude.git

Navigate to the Repository Folder

cd VQA_Claude

Install the Required Dependencies

pip install -r requirements.txt

Configuration

Obtain an API key from Anthropic.

You have three options for managing your API key:

Click here to view the API key configuration options

Setting it as an Environment Variable on Your Device (Recommended for Everyday Use)

Navigate to your terminal.
Add your API key like so:
```
export CLAUDE_API_KEY=your_api_key
```

This method allows the API key to be loaded automatically when using the wrapper or CLI.

Using an .env File (Recommended for Development)

Install python-dotenv if you haven't already:
```
pip install python-dotenv
```
Create a .env file in the project's root directory.
Add your API key to the .env file like so:
```
CLAUDE_API_KEY=your_api_key
```

This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.

Direct Input

If you prefer not to use a .env file, you can directly pass your API key as an argument to the CLI or the wrapper functions.

CLI

--api_key "your_api_key"

Wrapper

api_key="your_api_key"

This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.

Usage

Streamlit App

The Streamlit app provides a user-friendly interface for both text generation and vision analysis.

Launching the App

To start the Streamlit app, run:

streamlit run app.py

Examples

Below are some examples from the samples/ folder of the repository. The outputs are based on the prompt asked in the streamlitUI.png on the following PDF: samples/TestPDF.pdf.

menonpg/VQA_Claude