Image captioning has become an essential tool in making content accessible and interactive in digital spaces. With the advent of advanced LLM models like Google’s Gemini Pro Vision, generating captions for images has become more accurate and contextually relevant. In this blog, we will explore how to build a simple web application using Streamlit and Google Google’s Gemini Pro Vision to create a tool that generates captions for uploaded images.
Project repo: https://github.com/riad5089/Image-Captioning-Web-App-with-Gemini-Pro-Vision.git
python -m venv env
env\Scripts\activate
pip install -r requirements.txt
I made a web application using streamlit framework. This web application is hosted in share.streamlit you can check out this app here.