Pinned Repositories
AGQA_code_analysis
Ask-Anything
[CVPR2024][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
autogen_auto_homework
Autogen_MathGPT
Chat-UniVi
[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
DreamSync
GQA_code_analysis
LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
question_generation
shen2347.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
weikaih04's Repositories
weikaih04/AGQA_code_analysis
weikaih04/autogen_auto_homework
weikaih04/GQA_code_analysis
weikaih04/question_generation
weikaih04/shen2347.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
weikaih04/Ask-Anything
[CVPR2024][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
weikaih04/Autogen_MathGPT
weikaih04/Chat-UniVi
[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
weikaih04/DreamSync
weikaih04/LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
weikaih04/LLaVA-finetune
finetune llava for instructverse
weikaih04/PandaGPT
[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All
weikaih04/TaskMeAnything-rebuttal
weikaih04/TaskMeAnything-website
weikaih04/test_image
weikaih04/Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
weikaih04/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
weikaih04/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
weikaih04/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
weikaih04/weikaih04.github.io