This is a Streamlit app that uses the Moondream2 Vision Model to generate text based on an uploaded image and a user-provided prompt.
- Upload an image in PNG or JPEG format.
- Enter a prompt to guide the text generation.
- Generate text based on the uploaded image and prompt.
- Install the required Python packages:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run vision.py
- Open the app in your web browser at
http://localhost:8501
.
- Upload an image using the file uploader.
- Enter a prompt in the text input field.
- Click the "Generate" button to generate text based on the image and prompt.
The Moondream1 Vision Model is a small but powerful vision model that outperforms models twice its size. It was created by @vikhyatk.
This project is open source under the MIT license.