Pinned Repositories
DinoV2-SigLIP-Phi3-LoRA-VLM
GCG
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
QA-Prompts
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]
WHB139426.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
YoLLaVA
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
WHB139426's Repositories
WHB139426/GCG
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
WHB139426/QA-Prompts
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]
WHB139426/Grounded-VideoLLM
WHB139426/WHB139426.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes