PaliGemma: A versatile 3B VLM for transfer PyTorch implementation based on Umar Jamil's Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation.
To set up and run this project:
- Create a new environment with the provided
requirements.txt
file:virtualenv venv source venv/bin/activate pip3 install -r requirements.txt
- Run inference:
bash launch_inference.sh
- Implement a
Streamlit/Gradio
interface for interacting with the model - Fix an issue with generating tokens for model
- Add
requirements
file