/CameraBlip

Using BLIP to identify camera and uploaded images

Primary LanguagePython

๐Ÿ“ธ Visual Question Answering App ๐ŸŽ‰

Welcome to the Visual Question Answering (VQA) App! This fun and interactive app allows you to upload an image or capture one using your webcam, then ask questions about it. Powered by AI, the app will provide answers based on the imageโ€™s content! ๐Ÿค–โœจ

๐Ÿš€ Features

  • ๐Ÿ“‚ Image Upload: Easily upload an image from your device.
  • ๐Ÿ“ท Camera Capture: Snap a photo with your webcam directly in the app.
  • ๐Ÿง  Interactive Q&A: Ask any question about the image, and get an AI-generated response!

๐Ÿ”ง Setup

Prerequisites ๐Ÿ› ๏ธ

  • Python: Version 3.7 or later is required.

  • Libraries: Install the necessary packages by running the following command:

    pip install streamlit transformers torch pillow opencv-python-headless numpy

Installation ๐Ÿ–ฅ๏ธ

  1. Clone the Repository ๐Ÿ“‚:
git clone https://github.com/yourusername/vqa-app.git
cd vqa-app
  1. Run the Streamlit App ๐ŸŽˆ:
streamlit run app.py
  1. Open the App ๐ŸŒ: Once the app is running, open the provided URL (usually http://localhost:8501) in your browser.

๐ŸŽ‰ How to Use the App

1. Choose an Image Source:

  • ๐Ÿ–ผ๏ธ Select either "Upload Image" or "Use Camera" from the dropdown to choose your image source.

2. Upload or Capture an Image:

  • Upload: If uploading, choose an image file in .jpg, .jpeg, or .png format.
  • Camera: If using the camera, take a picture with your webcam.

3. Ask a Question โœ๏ธ:

  • Type a question about the image in the text input box.

4. Get the Answer ๐Ÿงฉ:

  • Click on the "Get Answer" button to receive a response generated by the model.

๐Ÿ“œ Code Overview

Initializing BLIP Processor and Model ๐Ÿง 

The app uses the BlipProcessor and BlipForQuestionAnswering models from the Hugging Face transformers library. These models are pre-trained for visual question answering, allowing the app to generate responses based on the content of an image.

Function Definitions ๐Ÿ“„

  • ask_question_about_image(image, question): Processes the image and question, and returns the answer.
  • load_image(image_file): Reads and converts uploaded images to RGB format.
  • convert_camera_image(img_array): Converts camera input images from Streamlit to a format compatible with the BLIP model.

Streamlit Layout ๐Ÿ–ผ๏ธ

The app layout provides users with:

  • Options to upload images or capture them with a camera.
  • A text input to enter questions about the image.
  • AI-generated answers displayed in response to the questions.

๐Ÿงช Example

Try This:

  • Upload an image of a cute cat. ๐Ÿฑ
  • Ask, "What is this animal?" ๐Ÿค”

Expected Result:

  • The app will respond with "cat" (or a similar accurate answer based on the image content).

๐Ÿงฉ Dependencies

This app relies on the following libraries:

  • Streamlit: For creating the interactive web app interface.
  • transformers: To load the BLIP model for visual question answering.
  • torch: To run the model on either CPU or GPU.
  • Pillow: For image processing.
  • OpenCV: For handling camera input images.
  • NumPy: For working with image arrays.

โš ๏ธ Troubleshooting

If you encounter any issues, try the following:

  • Compatibility: Ensure all dependencies are installed and compatible with Python 3.7 or later.
  • Streamlit Issues: Restart the Streamlit server using streamlit run app.py if any errors appear on the frontend.

๐Ÿ“œ License

This project is licensed under the MIT License. See the LICENSE file for details.


๐ŸŽ‰ Acknowledgments

  • ๐Ÿค— Hugging Face Transformers: For providing the BLIP model and tools for visual question answering.
  • ๐ŸŒˆ Streamlit: For creating an easy-to-use framework for building web apps.