/PaliGemma

PaliGemma paper implementation

Primary LanguagePython

PaliGemma

PaliGemma: A versatile 3B VLM for transfer PyTorch implementation based on Umar Jamil's Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation.

PaliGemma Architecture

Getting Started

To set up and run this project:

  • Create a new environment with the provided requirements.txt file:
    virtualenv venv
    source venv/bin/activate
    pip3 install -r requirements.txt
    
  • Run inference:
bash launch_inference.sh

TODO:

  • Implement a Streamlit/Gradio interface for interacting with the model
  • Fix an issue with generating tokens for model
  • Add requirements file