Phi3-Vision-huggingface

Vision Tasks with Local LLM Phi-3 Vision Model and Hugging Face

This repository contains code to perform vision tasks using the local LLM Phi-3 Vision model and the Hugging Face library. The code demonstrates how to generate a response based on an input image and a user-defined prompt.

Features

Image analysis using the Phi-3 Vision model
Text generation based on image content
Utilizes Hugging Face's transformers library

Installation

To run this project, you will need Python and the necessary dependencies. Follow the steps below to set up your environment.

Clone the Repository

git clone https://github.com/manunair1990/Phi3-Vision-huggingface

cd Phi3-Vision-huggingface

Install Dependencies

Install the required Python packages using pip.

pip install -r requirements.txt

Execute the code

python phi3_vision_huggingface.py

Notes To use a URL instead of a local image file, uncomment the relevant lines and replace the URL with your desired image URL.

Acknowledgements

Hugging Face for providing the model and tokenizer APIs.

The creators of the Phi-3 Vision model.