/Phi3-Vision-huggingface

This repository contains Python code for performing vision tasks using the Microsoft Phi-3 Vision model and the Hugging Face library. It demonstrates generating textual responses based on image content, showcasing the integration of advanced vision-language models for tasks such as image analysis and descriptive text generation.

Primary LanguagePython

Phi3-Vision-huggingface

Vision Tasks with Local LLM Phi-3 Vision Model and Hugging Face

This repository contains code to perform vision tasks using the local LLM Phi-3 Vision model and the Hugging Face library. The code demonstrates how to generate a response based on an input image and a user-defined prompt.

Features

  • Image analysis using the Phi-3 Vision model
  • Text generation based on image content
  • Utilizes Hugging Face's transformers library

Installation

To run this project, you will need Python and the necessary dependencies. Follow the steps below to set up your environment.

Clone the Repository

git clone https://github.com/manunair1990/Phi3-Vision-huggingface

cd Phi3-Vision-huggingface

Install Dependencies

Install the required Python packages using pip.

pip install -r requirements.txt

Execute the code

python phi3_vision_huggingface.py

Notes To use a URL instead of a local image file, uncomment the relevant lines and replace the URL with your desired image URL.

Acknowledgements

Hugging Face for providing the model and tokenizer APIs.

The creators of the Phi-3 Vision model.