This repository contains code to perform vision tasks using the local LLM Phi-3 Vision model and the Hugging Face library. The code demonstrates how to generate a response based on an input image and a user-defined prompt.
- Image analysis using the Phi-3 Vision model
- Text generation based on image content
- Utilizes Hugging Face's
transformers
library
To run this project, you will need Python and the necessary dependencies. Follow the steps below to set up your environment.
git clone https://github.com/manunair1990/Phi3-Vision-huggingface
cd Phi3-Vision-huggingface
Install the required Python packages using pip.
pip install -r requirements.txt
python phi3_vision_huggingface.py
Notes To use a URL instead of a local image file, uncomment the relevant lines and replace the URL with your desired image URL.
Hugging Face for providing the model and tokenizer APIs.
The creators of the Phi-3 Vision model.