Welcome to the Visual Question Answering (VQA) App! This fun and interactive app allows you to upload an image or capture one using your webcam, then ask questions about it. Powered by AI, the app will provide answers based on the imageโs content! ๐คโจ
- ๐ Image Upload: Easily upload an image from your device.
- ๐ท Camera Capture: Snap a photo with your webcam directly in the app.
- ๐ง Interactive Q&A: Ask any question about the image, and get an AI-generated response!
-
Python: Version 3.7 or later is required.
-
Libraries: Install the necessary packages by running the following command:
pip install streamlit transformers torch pillow opencv-python-headless numpy
- Clone the Repository ๐:
git clone https://github.com/yourusername/vqa-app.git
cd vqa-app
- Run the Streamlit App ๐:
streamlit run app.py
- Open the App ๐: Once the app is running, open the provided URL (usually http://localhost:8501) in your browser.
- ๐ผ๏ธ Select either "Upload Image" or "Use Camera" from the dropdown to choose your image source.
- Upload: If uploading, choose an image file in
.jpg
,.jpeg
, or.png
format. - Camera: If using the camera, take a picture with your webcam.
- Type a question about the image in the text input box.
- Click on the "Get Answer" button to receive a response generated by the model.
The app uses the BlipProcessor
and BlipForQuestionAnswering
models from the Hugging Face transformers
library. These models are pre-trained for visual question answering, allowing the app to generate responses based on the content of an image.
ask_question_about_image(image, question)
: Processes the image and question, and returns the answer.load_image(image_file)
: Reads and converts uploaded images to RGB format.convert_camera_image(img_array)
: Converts camera input images from Streamlit to a format compatible with the BLIP model.
The app layout provides users with:
- Options to upload images or capture them with a camera.
- A text input to enter questions about the image.
- AI-generated answers displayed in response to the questions.
- Upload an image of a cute cat. ๐ฑ
- Ask, "What is this animal?" ๐ค
- The app will respond with "cat" (or a similar accurate answer based on the image content).
This app relies on the following libraries:
- Streamlit: For creating the interactive web app interface.
- transformers: To load the BLIP model for visual question answering.
- torch: To run the model on either CPU or GPU.
- Pillow: For image processing.
- OpenCV: For handling camera input images.
- NumPy: For working with image arrays.
If you encounter any issues, try the following:
- Compatibility: Ensure all dependencies are installed and compatible with Python 3.7 or later.
- Streamlit Issues: Restart the Streamlit server using
streamlit run app.py
if any errors appear on the frontend.
This project is licensed under the MIT License. See the LICENSE
file for details.
- ๐ค Hugging Face Transformers: For providing the BLIP model and tools for visual question answering.
- ๐ Streamlit: For creating an easy-to-use framework for building web apps.