This repository contains Python code for a visual question answering project. The project uses various libraries and models to analyze images and answer questions related to them.
- Installation
- Usage
- Models
- Gradio App
To run the code in this repository, you need to install the following Python libraries:
- Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 gradio You can install these libraries using pip
The code in this repository is divided into three main sections:
- Image Description using BLIP Model (Our old model)
- MiniCPM-Llama3-V-2_5 Model (Trend model)
- Gradio App for Visual Question Answering
This section of the code uses the BLIP model to describe an image in detail. The image is fetched from a URL, and the description is generated using the BLIP model.
This section of the code uses the MiniCPM-Llama3-V-2_5 model to describe an image in detail. The image is loaded from a local file, and the description is generated using the MiniCPM-Llama3-V-2_5 model.
This section of the code creates a Gradio app for visual question answering. The app allows users to input an image and a question related to the image, and it generates a response using the MiniCPM-Llama3-V-2_5 model.
- Salesforce/blip-vqa-base
- openbmb/MiniCPM-Llama3-V-2_5
The Gradio app is created using the gradio library. It allows users to input an image and a question related to the image, and it generates a response using the MiniCPM-Llama3-V-2_5 model. The app is launched using the iface.launch(debug=True) command.
- The App is made by Pejman Ebrahimi
- Email:
pejman.ebrahimi77@gmail.com
andinfo@giltech-support.co.uk
- Website:
https://www.giltech-megoldasok.com/