VQA_Claude is an extension of the Claude AI Toolkit (see: https://github.com/RMNCLDYO/claude-ai-toolkit), offering advanced Visual Question Answering capabilities with multi-page PDF analysis using Claude 3.5 Sonnet at its backend, via a simple Streamlit app interface. The Claude AI toolkit itself is designed to be user-friendly and highly adaptable, making it suitable for both beginners and advanced users. With the integration of the Streamlit app, users can easily interact with the Anthropic's models for text generation and vision analysis.
- Streamlit App: User-friendly interface for interacting with the toolkit.
- Conversational AI: Create interactive, real-time chat experiences (chatbots) or AI assistants.
- Image Captioning: Generate detailed descriptions and insights or create captions from images.
- Text Generation: Produce coherent and contextually relevant text and answers from simple prompts.
- PDF Analysis: Analyze multi-page PDFs using Visual Question Answering techniques.
- Highly Customizable: Tailor settings like streaming output, system prompts, sampling temperature and more to suit your specific requirements.
- Lightweight Integration: Efficiently designed with minimal dependencies, requiring only the
requests
package for core functionality.
Python 3.x
- An API key from Anthropic
The following Python packages are required:
requests
: For making HTTP requests to the Claude API.python-dotenv
: For managing API keys and other environment variables.streamlit
: For the web interface.PyMuPDF (fitz)
: For PDF processing.
To use VQA_Claude, clone the repository to your local machine and install the required Python packages.
Here's the formatted content to match a README file for a Git repository:
git clone https://github.com/menonpg/VQA_Claude.git
cd VQA_Claude
pip install -r requirements.txt
Obtain an API key from Anthropic.
You have three options for managing your API key:
Click here to view the API key configuration options
-
Navigate to your terminal.
-
Add your API key like so:
export CLAUDE_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI.
-
Install
python-dotenv
if you haven't already:pip install python-dotenv
-
Create a
.env
file in the project's root directory. -
Add your API key to the
.env
file like so:CLAUDE_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv
installed and set up correctly.
If you prefer not to use a .env
file, you can directly pass your API key as an argument to the CLI or the wrapper functions.
CLI
--api_key "your_api_key"
Wrapper
api_key="your_api_key"
This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.
The Streamlit app provides a user-friendly interface for both text generation and vision analysis.
To start the Streamlit app, run:
streamlit run app.py
Below are some examples from the samples/
folder of the repository. The outputs are based on the prompt asked in the streamlitUI.png
on the following PDF: samples/TestPDF.pdf
.