ycchen218/VisionQA-Llama2-OWLViT

This is a multimodal model design for the Vision Question Answering (VQA) task. It integrates the Llama2 13B, OWL-ViT, and YOLOv8 models.

Python

No issues in this repository yet.