viai957/PaliGemma

The model is a hybrid of the SiglipVisionTransformer and the Gemma language model, enabling it to handle complex tasks such as image captioning, visual question answering, and more. PaliGemma seamlessly integrates visual and textual data to generate context-aware outputs

Python

Watchers

drkostas
University of Tennessee, Knoxville
viai957
Tapx