viai957/PaliGemma

The model is a hybrid of the SiglipVisionTransformer and the Gemma language model, enabling it to handle complex tasks such as image captioning, visual question answering, and more. PaliGemma seamlessly integrates visual and textual data to generate context-aware outputs

Python

No issues in this repository yet.