/Image-Captioning-Web-App-with-Gemini-Pro-Vision

Experience cutting-edge image captioning with our project powered by the Gemini Pro Vision model. This state-of-the-art solution combines the power of generative AI with the precision of the Gemini Pro Vision model to automatically generate rich and contextually relevant captions for images.

Primary LanguagePythonMIT LicenseMIT

Image-Captioning-Web-App-with-Gemini-Pro-Vision

Introduction

Image captioning has become an essential tool in making content accessible and interactive in digital spaces. With the advent of advanced LLM models like Google’s Gemini Pro Vision, generating captions for images has become more accurate and contextually relevant. In this blog, we will explore how to build a simple web application using Streamlit and Google Google’s Gemini Pro Vision to create a tool that generates captions for uploaded images.

STEPS to run the project:

STEP 01- Clone the repository

Project repo: https://github.com/riad5089/Image-Captioning-Web-App-with-Gemini-Pro-Vision.git

STEP 02-Create a conda environment after opening the repository

python -m venv env
env\Scripts\activate

STEP 03- install the requirements

pip install -r requirements.txt

Project Demo

Deployment

I made a web application using streamlit framework. This web application is hosted in share.streamlit you can check out this app here.